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1  Report  Summary  and  Organization 


Nonlinear  functions  of  binary  (and  general  integer)  variables  naturally  arise  when  modeling  selec¬ 
tions  and  interactions  within  a  unified  optimization  framework.  For  example,  consider  a  set  of  n 
objects  {1, . . . ,  n}.  For  each  pair  of  objects  (i,j)  we  associate  a  weight  qij  measuring  the  interaction 
between  objects  i  and  j.  Let  Xi  =  1  if  object  i  is  selected  (or  xt  =  k,  k  G  Z++  if  several  copies  of 
the  object  may  be  selected),  and  =  0,  otherwise.  If  the  global  interaction  is  the  sum  of  all  inter¬ 
actions  between  the  selected  objects,  then  their  global  interaction  can  be  formulated  as  a  quadratic 
function  )T”=1  J2]=i  QijXiXj. 

Pseudo-boolean  (and  general  nonlinear  integer)  functions  provide  an  extremely  powerful  mod¬ 
eling  and  solution  tool  in  operations  research  and  related  areas.  A  large  number  of  practical  as  well 
as  purely  theoretical  decision  problems  can  be  easily  represented  and  solved  as  optimization  of  a 
pseudo-boolean  (or  general  nonlinear  integer)  function. 

Given  the  recent  advances  in  computational  stochastic  discrete  optimization  and  continuing  pur¬ 
suit  of  cost-effectiveness  when  making  complex  decisions  in  noticeable  stochastic  dynamic  environ¬ 
ments,  we  have  witnessed  many  applications  of  stochastic  programming  to  real-world  problems  such 
as  capacity  planning,  facility  location,  and  production  control.  Stochastic  programming  provides  a 
simple  optimization  model  of  decision  making  under  uncertainty  to  overcome  many  limitations  of 
classic  deterministic  approaches.  However,  many  stochastic  combinatorial  optimization  problems 
are  notoriously  difficult  to  solve.  This  fact  has  greatly  hindered  the  further  application  of  stochastic 
programming  to  many  problems  where  discrete  decisions  are  involved. 

We  have  also  seen  the  increasing  number  of  nonlinear  models  for  combinatorial  optimization 
problems,  e.g,  different  types  of  nonlinear  assignment  problems,  application  of  various  quadratic 
and  fractional  binary  programming  models  in  medicine,  chemistry,  computational  biology  and  data 
mining,  etc.  These  models  are  much  more  suitable  for  modeling  real-world  problems  full  of  nonlin¬ 
earity  and  they  are  more  capable  for  modeling  interactions  among  involved  entities.  However,  very 
little  work  has  been  done  to  extend  these  wonderful  results  in  stochastic  dynamic  environments. 

Therefore,  the  major  goal  of  the  project  was  investigating  stochastic  pseudo-boolean  (and  general 
nonlinear  integer)  optimization  problems.  The  remainder  of  this  report  is  organized  as  follows: 

•  Section  2  provides  a  novel  solution  approach  for  solving  a  broad  class  of  two-stage  stochastic 
quadratic  integer  programs.  The  proposed  approach  is  based  on  the  value  function  refor¬ 
mulation  of  the  original  problem.  We  show  that  our  approach  can  solve  instances  whose 
extensive  forms  are  hundreds  of  orders  of  magnitude  larger  than  the  largest  quadratic  integer 
programming  instances  solved  in  the  literature. 

•  Section  3  describes  new  Lagrangian  based  approaches  for  solving  two-stage  stochastic  and 
deterministic  quadratic  binary  programs. 

•  Section  4  is  focused  on  two-stage  stochastic  extensions  of  the  classical  minimum  s  —  t  cut 
problem.  The  deterministic  minimum  s  —  t  cut  problem  has  two  equivalent  formulations  that 
are  motivated  by  two  different  interpretations  of  the  problem:  (i)  linear  0-1  program  (arc  based 
interpretation)  and  (ii)  quadratic  0-1  program  (node  partitioning  based  interpretation).  We 
show  that  stochastic  extensions  of  these  equivalent  deterministic  models  result  in  two  different 
stochastic  optimization  problems.  We  discuss  the  corresponding  mathematical  programming 
formulations  and  related  computational  complexity  issues. 

•  Section  5  considers  a  specific  stochastic  extension  of  the  bilevel  knapsack  problem.  Bilevel 
and  multilevel  optimization  is  extremely  important  in  military  and  law  enforcement  appli- 
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cations  since  it  naturally  models  the  adversarial  relationship  between  the  upper-  (the  at¬ 
tacker/defender)  and  lower-level  (the  defender/attacker)  decision  makers. 


•  Section  6  describes  greedy  approximation  algorithms  for  solving  a  class  of  two-stage  stochastic 
assignment  problems. 

•  Section  7  provides  a  global  optimization  algorithm  for  solving  a  class  of  multiple-ratio  frac¬ 
tional  combinatorial  optimization  problems.  Multiple-ratio  problems  often  arise  when  one 
considers  stochastic  extensions  of  single-ratio  fractional  problems. 

•  Section  8  describes  a  nonlinear  integer  optimization  model  for  the  irregular  polyomino  tiling 
problem.  It  is  motivated  by  an  antenna  design  application. 


In  Section  9  we  list  the  participants  of  the  projects.  Sections  10  and  11  summarize  the  most 
important  refereed  journal  publications  and  research  conference  presentations,  respectively. 

We  should  emphasize  that  according  to  the  performance  report  requirements  we  do  not  provide 
copies  of  already  published  articles  in  this  report. 
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2  Two-Stage  Stochastic  Quadratic  Integer  Programs 

The  details  of  the  work  in  this  chapter  can  be  found  in: 

•  O.Y.  Ozaltin,  O.A.  Prokopyev,  A.J.  Schaefer,  “Two-Stage  Quadratic  Integer  Programs  with 
Stochastic  Right-Hand  Sides,”  Mathematical  Programming ,  accepted  for  publication,  2010. 


2.1  Introduction 


We  consider  the  following  class  of  two-stage  quadratic  integer  programs  with  stochastic  right-hand 


sides: 

(PI):  max  -xT  Ax  +  cTx  +  EwQ(x,  u)  (la) 

subject  to  i£X,  (lb) 

where  X  =  {i£  Z™1  |  Ax  <  b}  and, 

Q(x,w)  =  max  +  dTy  (2a) 

subject  to  Wy  <  h{u)  —  Tx,  (2b) 

ye  K2-  (2c) 


The  random  variable  u  from  probability  space  (fi,  P,  V)  describes  the  realizations  of  uncertain 
parameters,  known  as  scenarios.  The  numbers  of  constraints  and  decision  variables  in  stage  i  are 
mi  and  n j,  respectively,  for  i  =  1,  2.  The  first-stage  objective  vector  c  G  Rni,  right-hand  side  vector 
b  G  Rmi  and  the  second-stage  objective  vector  d  G  R”2  are  known  column  vectors.  The  first-stage 
constraint  matrix  A  G  MmiXni,  technology  matrix  T  G  Mm2Xni  and  recourse  matrix  W  G  Rm2Xn2 
are  all  deterministic.  Furthermore,  A  G  RniXni  and  T  G  R,,2X"2  are  known,  and  possibly  indefinite, 
symmetric  matrices.  The  stochastic  component  consists  of  only  h(co)  G  R'”2  Vw  G  H. 

The  extensive  form  formulation  of  (PI)  is  given  by: 


^yMTry(u;)  +  dTy(uj) 

subject  to  x  G  X, 

Wy(u)  <  h(oj)  —  Tx 
y(u)  G  Z”2 


max  -xTAx  +  cTx  +  Ea 


(3a) 

(3b) 

Vtc  G  fl,  (3c) 

Vcj  G  H.  (3d) 


In  this  work  we  make  the  following  assumptions: 


A1  The  random  variable  oj  follows  a  discrete  distribution  with  finite  support. 

A2  The  first-stage  feasibility  set  X  =  {iG  Z”1  |  Ax  <  b}  is  nonempty  and  bounded. 

A3  Q(x,cu)  is  finite  for  all  x  G  X  and  wGll. 

A4  The  first-stage  constraint  matrix  A,  technology  matrix  T  and  recourse  matrix  W  are  all 
integral,  i.e.  A  G  ZmiXni,  T  G  Zm2Xni,  W  G  Zm2Xn2. 
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Assumption  A1  is  justified  by  Schultz  [136],  who  showed  that  the  optimal  solution  to  any 
stochastic  program  with  continuously  distributed  c a  can  be  approximated  within  any  desired  ac¬ 
curacy  using  a  discrete  distribution.  Assumption  A2  and  integrality  restrictions  in  the  first  stage 
ensure  that  X  is  a  finite  set.  Assumption  A3  ensures  that  Q(x,w)  is  feasible  for  all  x  G  X  and 
oj  G  11,  i.e.  relatively  complete  recourse  [148].  Assumption  A4  is  not  too  restrictive  in  a  sense, 
as  any  rational  matrix  can  be  converted  to  an  integral  one.  Most  of  the  stochastic  programming 
studies  in  the  literature  make  assumptions  similar  to  A1-A3  [7,  34,  89,  137]  and  A4  [89].  Without 
loss  of  generality,  we  also  assume  that  b  G  Zmi  and  h(u)  G  Z“2  Vcu  G  Q,  as  A,  T  and  W  are  all 
integer  matrices.  Note  that  all  of  the  undesirable  properties  of  stochastic  integer  programs,  e.g. 
discontinuity  and  nonconvexity  of  Q(x,tc),  still  exist  in  (PI). 

We  reformulate  (PI)  using  the  value  functions  of  the  first-  and  second-stage  quadratic  integer 
programs.  The  advantage  of  this  reformulation  is  that  it  is  relatively  insensitive  to  the  number 
of  variables  and  scenarios.  In  the  first  phase  of  our  solution  approach,  we  construct  the  value 
functions  in  both  stages.  In  the  second  phase,  we  use  a  global  branch-and-bound  algorithm  or  a 
level-set  approach  to  optimize  (PI)  over  the  set  of  feasible  first-stage  right-hand  sides. 

Our  approach  can  solve  very  large  instances  of  (PI)  as  measured  by  the  size  of  the  extensive 
form.  However,  it  is  sensitive  to  the  number  of  constraints  in  each  stage  and  the  magnitude  of  h(u). 
Note  that  the  number  of  quadratic  integer  programs  that  must  be  solved  when  constructing  the 
value  function  grows  exponentially  in  the  number  of  constraints.  A  major  contribution  of  this  work 
is  to  propose  algorithms  that  can  mitigate  the  effect  of  this  exponential  growth  to  some  extent  by 
exploiting  the  properties  of  value  functions.  Specifically,  our  approach  can  handle  instances  of 
(PI)  that  have  up  to  seven  constraints  in  each  stage. 

2.2  Contribution 

We  develop  an  algorithmic  framework  for  a  class  of  two-stage  stochastic  quadratic  integer  programs 
where  the  uncertainty  only  appears  in  the  second-stage  right-hand  sides.  The  main  contribution 
of  the  work  is  twofold.  First,  we  derive  some  theoretical  properties  of  QIP  value  functions.  These 
properties  may  be  useful  in  sensitivity  analysis  of  quadratic  integer  programs  [43,  73].  Second,  we 
use  these  properties  as  well  as  super  additivity  to  develop  efficient  algorithms  for  computing  value 
functions  of  QIPs.  We  then  apply  a  dual  reformulation  and  use  a  generic  global  branch-and-bound 
algorithm  and  a  level-set  approach  to  find  an  optimal  tender. 

This  work  represents  an  important  first  step  towards  more  general  two-stage  stochastic  quadratic 
integer  programs  where  uncertainty  appears  in  the  second-stage  objective  and  constraint  matrix, 
as  well  as  the  right-hand  side.  We  note  that  our  approach  is  amenable  to  solve  general  two-stage 
stochastic  quadratic  integer  programs  as  long  as  the  scenarios  may  be  divided  into  relatively  few 
groups  that  share  the  same  objective  functions  and  constraint  matrices.  For  such  instances,  the 
value  function  must  be  found  for  the  first  stage  and  each  group  of  scenarios. 

The  major  limitation  of  our  two-phase  solution  approach  is  the  explicit  storage  of  value  functions 
in  computer  memory.  This  is  why  our  computations  are  based  on  instances  that  have  large  number 
of  columns  and  scenarios  but  relatively  few  rows.  One  approach  to  overcome  this  limitation  is  to 
seek  more  efficient  ways  to  store  value  functions,  such  as  using  generating  functions  [96].  Another 
approach  is  to  modify  the  global  branch-and-bound  algorithm  to  calculate  the  solution  on  a  subset 
of  right-hand  sides  so  that  only  a  portion  of  the  value  function  needs  to  be  stored  at  any  time. 
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3  Two-Stage  Stochastic  Quadratic  Binary  Programs 

This  chapter  is  mostly  based  on  the  results  from: 

•  Z.  Zhu,  N.  Kong,  O.A.  Prokopyev,  “A  New  Lagrangian  Decomposition  Based  Approach  for 
Quadratic  Binary  Programs,”  Technical  Report,  2011. 

•  Z.  Zhu,  N.  Kong,  O.A.  Prokopyev,  “Two-Stage  Stochastic  Quadratic  Binary  Program  with 
Recourse:  A  Dual  Decomposition  Approach,”  Working  paper,  2011. 


3.1  Introduction 


A  two-stage  stochastic  quadratic  binary  program  with  fixed  resource  (SQBP)  can  be  presented  as: 


I\ 

(SQBP)  :  max  cTx  +  xTCx  +  pkQ(x ,  k) 

x  J 

k= 1 


s.t.  Ax  <  6; 


and  for  each  k  =  1 , ,K, 

Q(x,k)  =  max  {dk)Ty  +  yTDky 

y 

s.t.  Wy  <  hk  -  Tkx- 

y  e  {0,i}n2. 

In  Problem  (SQBP),  matrices  C  :=  {cij}  G  IR"1  and  Dk  =  {dij}  G  IR"2  for  k  =  1, ...  ,/P,  contain 
first-stage  and  second-stage  objective  coefficients  for  the  quadratic  terms  of  XiXj  and  y%yj,  respec¬ 
tively,  and  vectors  c  :=  {q}  G  IRni  and  dk  :=  {dk\  G  IR"2  for  k  =  1  ,...,//,  contain  hrst-stage 
and  second-stage  objective  coefficients  for  the  linear  terms  X{  and  q,  respectively.  In  addition, 
matrices  A  G  |RmixnQ  \y  g  |R"i2xn2^  y  ^  |Rm2Xni :  for  k  =  1, . . . ,  K,  are  known  real  matrices,  and 
vectors  b  G  IRmi ,  hk  G  IR™2 ,  for  k  =  1, . . . ,  K,  are  known  real  vectors.  With  the  problem,  a  decision 
maker  takes  the  first-stage  decisions  and  takes  the  second-stage  recourse  decision  based  on  some 
realization  of  the  uncertainty,  which  is  not  exogenous  with  respect  to  hrst-stage  decisions.  The 
objective  is  to  maximize  the  sum  of  the  hrst-stage  beneht  and  the  expected  second-stage  benefit. 
To  avoid  complications  when  computing  the  expectation,  we  assume  that  we  only  have  a  finite 
number  K  of  scenarios.  Hence,  each  scenario  k  =  1, . . .  ,K,  having  probability  pk ,  is  represented 
by  ( dk,Dk,hk,Tk ). 

The  problem  (SQBP)  is  equivalent  to  a  large,  dual  block-angular  quadratic  binary  program.  For 
k  =  1, . . . ,  K,  we  define  the  set 


Fk  :=  {(x,yk)  :  Ax  <b,x€  IB n\Tkx  +  Wyk  <  hk,yk  G  IB"2}. 


Then  the  deterministic  equivalent  of  (SQBP)  can  be  written  as 


2  =  max  |  cTx  +  xTCx  +  J^pk  ({dk)T yk  +  (yk)rDkyk^j 


k= 1 


(x,  yk)  G  Fk,  k  =  1, . . .  ,K  >  .  (4) 


It  is  important  to  develop  efficient  solution  methods  for  SQBPs  for  the  following  two  reasons. 
First,  deterministic  QBPs  have  been  extensively  studied  in  scheduling  [10],  computer-aided  design 
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[17,  84],  computational  biology  [60],  among  others.  Many  graph-theoretic  problems  can  be  naturally 
formulated  with  quadratic  binary  programs  (QBPs)  [2,  115].  Second,  optimization  under  uncertainty 
has  been  shown  critical  in  many  decision  problems  as  stochastic  linear  binary  programs  (SLBPs), 
the  linear  counterpart  of  SQBPs  (i.e.,  C  =  D  =  0),  have  been  applied  in  many  fields  such  as  energy 
planning  [86],  manufacturing  [47],  logistics  [92],  etc. 

It  is  not  surprise  that  solving  SQBPs  is  even  more  computationally  prohibitive  as  SQBPs  inherit 
computational  challenges  from  both  stochastic  integer  programming  and  deterministic  quadratic 
binary  optimization.  Very  little  work  has  been  done  to  develop  efficient  algorithms  for  SQBPs. 
In  this  chapter,  we  adapt  dual  decomposition  (or  termed  scenario  decomposition),  an  approach 
originally  developed  in  Carpe  and  Schultz  [33]  for  solving  SIPs.  The  idea  of  scenario  decomposition 
is  to  introduce  copies  x1, . . . ,  xK  of  the  first-stage  variable  x  and  then  rewrite  Problem  (4)  in  the  form 


r  k 

max  W  (cTxk  +  ( xk)TCxk  +  {dk)Tyk  +  ( yk)T Dkyk ) 

x1  ,...,xK ly1  I  .  V  / 


.k= 1 


(xk,yk)€Tk, 


k  =  1, . . . ,  K ,  x1  =  •  •  •  =  xK  }  . 


(5) 


Here  the  non-anticipativity  condition  xl  =  •  •  •  =  xK  states  that  the  first-stage  decision  should  not 
depend  on  the  scenario  that  is  realized  in  the  second  stage.  To  solve  (5),  one  can  apply  Lagrangian 
relaxation  with  respect  to  the  non-anticipativity  condition  and  recovery  the  identity  among  the 
first-stage  variables  with  branch  and  bound.  A  major  advantage  of  this  solution  approach  is  that  it 
splits  into  separate  subproblems  for  different  scenarios.  Note  that  the  idea  of  dual  decomposition  can 
be  related  to  existing  techniques  in  both  combinatorial  optimization  (i.e.,  [74,  83])  and  stochastic 
programming  (i.e.,  [131,  132]). 

For  SIP,  it  is  clear  that  there  exist  several  equivalent  representations  of  the  non-anticipativity 
condition.  In  general,  it  can  be  represented  by  the  equality  k=1  Tkxk  =  0  where  T  =  (r1, . . . ,  TA) 
is  a  suitable  matrix.  One  notable  representation  in  SLBP  is  via  the  single  constraint 


x 


72X2  +  . . .  +  ^kxK , 


(6) 


where  72, ... ,  7 k  are  positive  weights. 

When  extending  the  idea  of  dual  decomposition  to  SQBP,  we  enjoy  more  flexibility  on  imposing 
the  non-anticipativity  condition.  Feasible  representations  may  involve  quadratic  constraints,  i.e., 
Akxk(xk)T  =  0  where  A  =  (A1, . . . ,  Aa  )  is  a  matrix.  Each  feasible  representation  leads  to  a 
Lagrangian  decomposition  scheme.  It  is  well  known  that  there  exists  a  tradeoff  between  the  solution 
efficiency  and  bounding  quality  of  Lagrangian  duals.  In  this  chapter,  we  explore  a  few  representa¬ 
tions  that  involve  quadratic  constraints  and  investigate  corresponding  dual  decomposition  schemes. 
For  each  deterministic  quadratic  binary  Lagrangian  dual  subproblem,  we  propose  an  innovative 
Lagrangian  decomposition  based  branch-and-bound  method  that  is  inspired  by  the  idea  of  variable 
splitting.  In  fact,  the  proposed  Lagrangian  decomposition  based  branch-and-bound  method  is  suitable 
to  generic  quadratic  binary  programs. 

The  remainder  of  the  chapter  is  organized  as  follows.  In  Section  3.2,  we  introduce  several 
representations  of  the  non-anticipativity  condition.  We  derive  the  corresponding  Lagrangian  duals 
and  compare  their  tightness  analytically.  In  Section  3.3,  we  report  preliminary  computational 
experiments  on  bound  tightness  and  solution  efficiency  of  the  SQBPs.  In  Section  3.4,  we  present 
the  innovative  Lagrangian  decomposition  based  approach  for  general  quadratic  binary  programs. 
Section  3.5,  we  draw  conclusions  and  outline  future  research. 


3.2  Alternative  Dual  Decomposition  Schemes 

In  this  section,  we  explore  alternative  dual  decomposition  schemes  with  respect  to  several  rep¬ 
resentations  of  the  non-anticipativity  condition.  The  two  key  factors  are  1)  inclusion  of  quadratic 
constraints  to  represent  the  non-anticipativity  condition  and  2)  sequence  of  linearization  of  quadratic 
cross  terms  and  dual  decomposition  over  scenarios. 

We  present  the  two  generic  forms  of  the  Lagrangian  relaxation  with  respect  to  the  non-anticipativity 
condition.  First,  we  represent  the  condition  only  with  linear  constraints.  The  Lagrangian  relaxation 
is  the  problem  of  finding  xk,  yk,  k  =  1, . . . ,  K,  such  that 

D(y')  =  max  |  J2Lk(xk,yk,^)  :  (xk,yk)  e  jA  ,  (7) 


where  y1  has  proper  dimension  and  L^(xk ,  yk,  y?)  =  pk  (cTxk  +  ( xk)TCxk  +  ( dk)T yk  +  ( yk)T Dyk )  + 
y'y(Tkxk)  for  k  =  1, . . . ,  K.  The  Lagrangian  dual  of  Problem  (7)  then  becomes  the  problem 

zLD  :=  min D(y1).  (8) 

Second,  we  represent  the  condition  with  both  linear  and  quadratic  constraints.  Thus,  the  Lagrangian 
relaxation  is 

D\y\yX)  =  max  | J^L'k(xk ,  yk ,  y\  yX)  :  (xk,yk)  €  7k  j  ,  (9) 

where  y1  and  yx  have  proper  dimensions,  and  L’k(xk,  yk,  y1 ,  yx)  =  pk  ( cTxk  +  ( xk)TCxk  +  ( dk)Tyk 
+{yk)T Dyk^  +  y’1(Tkxk)  +  yx{Akxk{xk)T)  for  k  =  1, . . .  ,K.  The  Lagrangian  dual  of  Problem  (9) 

ZQ°  ■=  min  D'{y’,yX).  (10) 

Since  there  are  an  enormous  amount  of  possible  representations  on  the  non-anticipativity  condi¬ 
tion,  i.e.,  enormously  many  forms  of  T  and/or  A,  we  consider  four  commonly  used  representations. 
The  first  set  of  two  presentation  are  universally  applicable.  They  use  at  least  one  constraint  to 
force  the  identity  of  first-stage  variables  with  respect  to  each  pair  of  scenarios.  For  the  first  repre¬ 
sentation,  we  impose  constraints  xkl  =  xk2  for  all  1  <  k\  <  k2  <  K.  For  the  second  representation, 
in  addition  to  imposing  the  above  constraints,  we  impose  constraints  xkl(xkl)'1  =  xk2(xk2)T  for  all 
1  <  k±  <  k2  <  K.  In  other  words,  for  any  cross  term  of  first-stage  variables,  we  impose  constraints 
to  force  the  scenario-wise  identity.  With  proper  specifications  of  T  and  A,  we  can  further  derive  (7) 
and  (9).  We  denote  the  corresponding  Lagrangian  duals  by  z^D  and  z\jq  .  respectively. 

The  second  set  of  two  representations  of  the  non-anticipativity  condition  that  we  consider  in 
this  chapter,  are  applicable  when  all  first-stage  variables  are  required  to  be  binary,  which  is  the 
case  in  our  problem.  These  two  representations  use  one  or  two  constraints  to  force  the  scenario- 
wise  identity.  To  distinguish  with  the  two  representations  described  in  the  previous  paragraph,  we 
call  these  two  representations  the  third  and  fourth  representations.  For  the  third  representation, 
we  impose  constraints  (6)  for  all  pairs  of  scenarios.  For  the  fourth  representation,  in  addition  to 
imposing  constraints  (6),  we  impose  constraints 


^  x1(x1)T  =  ^2  A kxk{xk)T. 


(11) 
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With  the  third  representation,  we  further  write  the  Lagrangian  relaxation  as  follows.  For  k  =  1, 
the  Lagrangian  relaxation  associated  with  scenario  k  =  1  is 

Di(^)  = 


max^  jp1  ^(cT  -  1 (X>)/*V  +  +  (dx)V  +  (yYD1^  |  .  (12) 


(x1, 

For  k  =  2, ...  ,K,  the  Lagrangian  relaxation  associated  with  scenario  k  is 

Dk(, u?)  =  max __  ( pk  ( (cT  +  ^k)pT)xk  +  (xk)TCxk  +  ( dk)Tyk  +  (yk)T Dk{yk) 


(xk 


pn 


(13) 


Then  the  Lagrangian  dual,  denoted  by  zlf D  is  as: 


K 


ZnD  :  =  min 

fi-y 


(14) 


.  k=  1 


With  the  fourth  representation,  we  further  write  the  Lagrangian  relaxation  as  follows.  For  k  =  1, 
the  Lagrangian  relaxation  associated  with  scenario  k  =  1  is 


D[(y\yx)  = 


jp1  (V  -  +  (xl)T(C'  “  +  (d1)7^1  +  (y1)TD1(y1)Sj  |  .  (15) 


(x1, 

For  k  =  2 , ...  ,K,  the  Lagrangian  relaxation  associated  with  scenario  k  is 

D'k(y\yx)  = 


O' 


max  lpk  ( (cT  +  \lk)yfi)xk  +  ( xk)T{C  +  A kI)xk  +  {dk)Tyk  +  (; yk)TDk(yk ))  )  .  (16) 

!,Vk) €Tk  f  \  P  /  J 


Then  the  Lagrangian  dual,  denoted  by  z^q  is  as 


Znn  :=  min 


.  k= 1  J 


(17) 


Remark  1  Several  results  follow  trivially.  They  are:  1)  ZqD  <  zLD ;  2)  z^q  <  z±D ;  and  3) 


~LD  ^  ~LD 
Z2Q  —  z2  * 


Remark  2  The  approach  of  imposing  non-anticipativity  constraints  on  quadratic  terms  (zqD  ,  z\q  , 
Z2Q  )  identical  to  first  applying  standard  linearizaton  on  first-stage  variables  and  then  applying 
dual  decomposition,  i.e.,  (J2k=2  ^ k)zij  =  12/2=2  zij  w ^  zij  =  xixj  for  1  E  *  <  J  <  ni- 


There  is  a  common  feature  in  the  above  described  dual  decomposition  schemes.  That  is,  all 
Lagrangian  duals  are  derived  without  linearizing  the  cross  terms  of  first-stage  decision  variables. 
Next  we  describe  several  alternative  dual  decomposition  schemes,  which  are  related  to  the  following 
reformulation  of  Problem  (4). 


2  =  max 


n  i 

E 

i= 1 


n\  —  1  n\ 

c^i  +  y~]< 

i=  1  3= 2 
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I< 

+£ 


ri2  U2  —  1  n2 

pME4»‘+EE4rf»‘ 


fe=i 


1=1 


*=1  i=2 


Zjj  =  XiXj,  ( x,yk )  €  J* 


(18) 


In  the  above  formulation  (18),  without  loss  of  generality,  we  assume  that  cVJ  =  Cjt  for  i,  j  =  1, . . . ,  m, 
i  <  j,  and  d{j  =  dji  for  i,j  =  1, . . .  ,77-2,  i  <  j ■  It  is  easy  to  see  (18)  is  equivalent  to  Problem  (4). 
We  next  impose  the  non-anticipativity  condition  only  on  linear  terms  of  first-stage  variables.  We 
rewrite  Problem  (18)  as  follows. 


X 


1 


max 

subject  to 


K  l 

ni 

n2 

n2  — 1  ri2  \ 

711  —  I  ni 

,  £f>‘ 

E 

+ e  4  Hi  + 

+  e  yy  cp 

(19) 

1  k= 1  \ 

2—  1 

V 

2=1 

i=i  1=2  / 

i=i  1=2 

(xk,yk)€ 

Tk 

k  -- 

=  i,...,a; 

(20) 

I< 

£r‘x‘. 

=  0, 

(21) 

fe=l 

zij  —  xi  1 

hj 

=  1, 

. . .  ,ni,  j  >  'i 

,  k  = 

(22) 

Zij  £  xj? 

i,3 

=  1, 

. . .  ,ni,  j  >  i 

,  k  =  1, . . . ,  K, 

(23) 

Zij  >Xkjr 

-1, 

hj  =  !,•••, 

n\,  j  >  i,  k=  1, 

(24) 

Zij  G  [0, 1] 

>  L 

j  = 

l,...,ni. 

(25) 

Note  that  to  ensure  optimality,  variables  z  are  only  required  to  be  continuous  between  0  and  1. 

In  the  following  presentation,  we  fix  the  presentation  of  the  non-anticipativity  condition  to  be 
(6).  Therefore,  we  replace  (21)  with  (6).  We  apply  Lagrangian  relaxation  with  respect  to  both  (21) 
and  (22)  -  (24)  as  follows.  For  each  i,  we  denote  to  be  the  Lagrangian  multiplier  associated 
with  (21).  For  each  i,  j  =  1, . . . ,  m  with  j  >  i  and  each  k  =  1, . . . ,  K,  we  denote  6kj  and  9kt  to 
be  the  Lagrangian  multiplier  associated  with  each  of  constraints  (22)  and  each  of  constraints  (23), 
respectively.  For  each  i.  j  =  l,...,m  with  j  >  i  and  each  k  =  I ....  .  K .  we  denote  \kj  to  the 
Lagrangian  multiplier  associated  with  each  (24). 

For  k  =  1,  the  Lagrangian  relaxation  associated  with  scenario  1,  is 

D'{(ii,e\  x1)  = 


f  ni 

/  k 

n\ 

1—1  N 

\  ri2 

U2— 1  n2 

max  <  E 

plci  -  (^2  ik)  ^  + 

£<%- 

£ 

E  -  E 

I  +  y>E4 

+  E  XE4 

44 

b=i 

\  k= 2 

j=2+l 

>:|  / 

/  i=i 

1=1  J=2 

ni  — 1  ni 

711  —  I  ni 

] 

+  E  E(cl  _  4 _ 

6ji  +  4 

)^27  “1“ 

£  £  A« 

(x1,^1)  €  Tx,Zij 

€  [0,1]  ^  . 

(26) 

1=1  j=2 

1=1  i=2 

J 

For  k  =  2 , ,K,  the  Lagrangian  relaxation  associated  with  scenario  k,  is 


D''(ii,9k,  Xk)  = 


n\  I 

E  1  pk  °i + pi + e  @ij 

i=l  \  j^i 


£  4 


j=i+ 1 


2—1 


n  2 


n  2  —  I  n  2 

Aj,  ]  xf +23/4^  +  v 


3= 1 


2—  1 


1=1  j=2 
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max 


(xk,yk)  €  Tk,Zij  €  [0,1] 


(27) 


n\  —  1  ni 


n\  —  1  ni 


+  y  Y^  -  dij  -  9%  +  At?)^  +  +  X X] A 


2=1  j— 2  2—  1  J=2 

Therefore,  the  corresponding  Lagrangian  dual,  denoted  by  Z3  ^  is  as 

I< 

4°  =  mm  X  i?fc(/i,  #A',  Afc) 

M’  ’  fc=i 


(28) 


Note  that  in  each  of  the  above  Lagrangian  relaxations,  variables  Zij  are  unconstrained  except  for 
the  bounds.  Hence,  in  (26)  and  (27),  for  scenario  k.  k  =  1 ,K,  the  optimal  solution  z*.  =  1  if 
(Hj  -  9kj  -  9kt  +  Xkj  >  0,  and  z*}  =  0  otherwise. 

Remark  3  Since  to  compute  z\D ,  one  does  not  relax  the  non-anticipativity  constraints  on  variables 
z^,  we  have  z^D  <  z^q  ■ 

There  are  a  large  number  of  constraints  (22)  -  (24)  when  K  is  large,  which  presents  significant 
computational  challenge  in  computing  z^D .  Hence,  we  consider  alternative  Lagrangian  relaxations 
that  are  easier  to  compute  but  provide  inferior  bounds.  By  the  non-anticipativity  constraint  (21),  it 
is  sufficient  to  keep  the  set  of  constraints  (22)  -  (24)  for  only  one  scenario.  Let  us  define  zkk)  to  be 
the  Lagrangian  dual  if  we  impose  constraints  (22)  -  (24)  only  for  a  selected  scenario  k.  Furthermore, 
one  can  consider  replacing  (22)  -  (24)  with  the  following  aggregate  constraint  sets: 


A' 

\  K 

£(“‘  +  d) 

■  Zij  <  Y(aixi  +  ajxj)’  *>  J  =  !>  •  • 

.  ,?ri,  j  >  i; 

(29) 

,fc=i 

/  fc=i 

A  \ 

A 

X]  Pij  )  ■  Zi3 

■  ,ni,  j  >  *; 

(30) 

,fc=l  / 

fc=i 

In  (29),  we  introduce  coefficients  ak  for  i  =  1. . . . .  m  and  k  =  1 , ,K,  which  are  associated  with 

constraints  (22)  and  (23).  In  (30),  we  introduce  coefficients  /?*)  for  i,j  =  1 , . « 1 .  i  <  j,  and 

k  =  1  which  are  associated  with  constraint  (24).  However,  constraints  (25)  need  to  be 

replaced  by 

z^  <E  {0,1},  i,j  =  l,...,ni,  j  >  i.  (31) 

We  apply  Lagrangian  relaxation  with  respect  to  alternative  non-anticipativity  constraints  (29)  and 
(30),  and  constraints  (31),  to  compute  an  alternative  Lagrangian  dual  as  follows.  For  each  index  pair 
(i,j)  with  i  <  j ,  we  denote  6ij  and  XVJ  to  be  the  Lagrangian  multipliers  associated  with  constraints 
(29)  and  (30). 

For  k  =  1,  the  Lagrangian  relaxation  associated  with  scenario  1,  is 


m 


K  n\  i—1  m  i— 1 

D'i(p,  0 ,  A)  =  max  <j  Y  (  A;  ~  (X  (  Y  +  X  M  ~  X  ~  X  ^]*AJ*  ]  xi 

j=i+l  3=  1  j=i+ 1  3= 1 


2=1 


k= 2 
n2 


n2  — 1  n2 

+Ypldiyi +  X  Ypldhyly) 

2=1  2=1  j= 2 


721  —  1  1 


721  —  1  72 1 


+  Y  X^  ~  +  a})9ij  +  Pij A*j ) zij  +  X  X  Ab 

2=1  j=2  2=1  j= 2 
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(x1,?/1)  <E  Tl,Zij  <E  {0,1}  i  .  (32) 


Inst  No. 

ZTD 

z3  A 

ZTD 

Z2Q 

1 

1863.19  (2.04) 

1883.39  (3.00) 

2 

1989.14  (1.45) 

2008.64  (3.67) 

3 

2010.63  (2.84) 

2134.25  (3.74) 

4 

1899.41  (2.46) 

1843.52  (3.30) 

5 

1855.97  (4.73) 

1897.65  (5.43) 

Table  1:  Computational  results  on  5  randomly  generated  SQBP  instances.  Note  that  we  relaxed 
binary  restrictions  on  the  second  stage 


For  k  =  2, . . . ,  K,  the  Lagrangian  relaxation  associated  with  scenario  k,  is  as 


f  ni 

(  n  1 

i—  1 

ni 

i-1  \ 

D'k(n,  0,  A)  =  max  l  y 

pk Ci  +  pi  +  ak  (  y  ^  9jj 

+ y  %) 

y  4j^ij 

-£ 

l-i 

V  j=i+ 1 

3= 1 

j=i+ 1 

i=>  / 

ri2  U2  —  1  ri2 

+Y/pkd^+j2Y/Pkd^ 

i=  1  i=l  j= 2 

n\  —  1  m  n\  —  l  n\ 

+  y  yy  ~  (“f + aj)9ij + Pij\j)zij + y  y  ^ 

i=l  j=2  i=l  jf=2 

Therefore,  the  corresponding  Lagrangian  dual,  denoted  by  ■  is  as 

K 

4a  =  min  y  D'l'iii,  9,  A) .  (34) 

Note  that  in  each  of  the  above  Lagrangian  relaxations,  variables  Zij  are  again  unconstrained  except 
for  the  bounds.  Hence,  they  are  easily  obtained. 

Although  it  is  clear  that  both  zk^  >  4°  and  4q  —  4D  i  if  is  unclear  the  comparison  between 
z^a  and  z?2q  ■  We  explore  the  computational  tradeoff  between  the  two  bounds  with  preliminary 
numerical  studies  presented  next. 

3.3  Preliminary  Numerical  Study 

We  conducted  preliminary  computational  experiments  to  investigate  two  Lagrangian  relaxation 
bounds  z^2  and  z%q  ■  At  the  initial  stage  of  computational  experimentation,  we  relaxed  the  binary 
restrictions  on  the  second-stage  decision  variables  to  be  able  to  test  more  and  larger  instances. 
We  implemented  the  test  instance  generator  in  Python.  To  compute  z^a  and  z%q  ,  we  applied  a 
standard  subgradient  method  and  used  Cplex  12.0  when  solving  the  required  linear  programs. 

In  Table  1,  we  present  the  comparative  results  for  five  small  test  instances,  each  of  which  has  ten 
first-stage  decision  variables,  twenty  second-stage  decision  variables,  two  first-stage  constraints,  two 
second-stage  constraints,  twenty  scenarios,  i.e. ,  ni  =  10,  n-2  =  20,  mi  =  2,  m2  =  2 ,K  =  20.  In  the 
table,  the  numbers  in  the  parentheses  are  the  associated  CPU  times  in  seconds.  The  computational 
results  suggest  that  in  general,  computing  z^a  is  iess  time  consuming  than  computing  z^q  ■  But  it 
is  not  clear  on  the  superiority  between  the  two  bounds.  Our  conclusions  hold  for  other  classes  of 
test  instances. 


(xk,yk)  €  Tk,Zij  G  {0,1} 


(33) 
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3.4  A  Lagrangian  Decomposition  Approach  for  QBPs 
3.4.1  Introduction 

One  of  the  most  well-known  and  studied  classes  of  nonlinear  integer  optimization  problems  is  the 
maximization  of  a  quadratic  0-1  function  subject  to  a  set  of  linear  0-1  constraints: 


n  n  n  ^ 

(P0)  :  max  E  CiXi  +  E  E 

*= 1  *=i  i=i  jA* 

n 

s.t.  ^2  ciki%i  <  bf.,  for  k  =  1, ... ,  m; 
i=  1 

Xi  £  {0, 1},  for  i  =  1, . . . ,  n, 


(35) 


(36) 


where  Cj,  c.Lj  £  M  for  1  <  i  ^  j  <  n  and  bk,  dki  £  M  for  k  =  1, . . . ,  m  and  i  =  1, . . . ,  n.  Problem  (P0) 
is  typically  referred  to  as  a  constrained  quadratic  binary  problem  (QBP)  [25].  Since  for  each  binary 
variable  xf  =  Xj,  then  cl%x\  can  be  rewritten  as  CjXj  with  c*  =  cu.  Without  loss  of  generality,  we 
also  assume  that  =  Cji  for  any  pair  (i,  j)  with  i  /  j. 

A  quadratic  term  that  represents  a  pair  of  binary  variables  arises  naturally  in  modeling  inter¬ 
actions  among  entities.  Furthermore,  given  the  fact  that  optimization  of  general  pseudo-boolean 
functions  can  be  reduced  in  polynomial  time  to  optimization  of  a  quadratic  binary  function  [25], 
QBP  is,  arguably,  the  most  important  class  of  the  general  pseudo-boolean  optimization  problem. 
Many  important  problems  in  engineering,  physics,  chemistry,  biology,  medicine  and  a  variety  of 
other  application  domains,  can  be  formulated  as  QBPs.  To  name  a  few,  such  problems  have  been 
studied  in  scheduling  [10],  computer-aided  design  [17,  84],  solid-state  physics  [15,  17],  protein  de¬ 
sign  [62,  87],  computational  biology  [60],  and  epileptic  seizure  prediction  [81].  In  addition,  many 
graph-theoretic  problems  can  be  naturally  formulated  with  QBPs,  including  well-studied  maximum 
clique  and  maximum  independent  set  problems  [2,  115,  120,  121]. 

There  are  only  a  limited  number  of  classes  of  QBPs  known  to  be  polynomially  solvable  [11,  16, 
25,  117,  123].  In  general,  QBPs  are  IVP-hard  combinatorial  optimization  problems.  Even  if  we 
know  that  the  global  optimum  is  unique,  QBPs  remain  AP-hard  [118].  In  terms  of  solving  general 
QBPs,  we  have  witnessed  the  development  of  various  heuristics  [65,  66,  95,  111,  112,  119]  and  exact 
solution  methods. 

Most  of  the  exact  solution  methods  are  focused  on  efficient  linearization  techniques  [3,  4,  25,  64, 
109,  110].  The  main  concept  of  the  linearization  is  to  reformulate  the  original  QBP  as  an  equivalent 
linear  mixed  0-1  problem,  which  can  be  solved  efficiently  with  off-the-shelf  mixed-integer  linear 
programming  solvers  such  as  the  CPLEX  MIP  solver  ( www.ilog.com/products/cplex ).  Although  the 
proposed  reformulations  in  the  above  references  may  differ  significantly,  almost  all  of  them  share 
the  same  key  idea.  That  is,  replacing  the  nonlinear  terms  with  auxiliary  variables  and  adding  an 
additional  set  of  linear  constraints.  In  the  most  commonly  used  linearization  reformulation  (i.e. ,  the 
standard  linearization  as  it  is  termed  in  this  chapter),  a  new  variable  zt]  is  introduced  to  replace 
each  cross  term  xpXj,  which  results  in  the  following  formulation: 


(PSL)  : 


max  • 


E 

i= 1 


CjX, 


+  EE' 

i=  1  j>i 


(37) 
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n 


^  ^  O'kjXj  f  bk,  k  — 
i= 1 

Zij  >  Xi  +  Xj  -  1,  i,j  =  1, . . . 

l,...,m; 

(38) 

,n;i  <  j ; 

(39) 

Xi>Zij,  i,j  =  1,... 

,rr,i<  j ; 

(40) 

Xj>Zij ,  i,j  =  1,... 

,rr,i<  j; 

(41) 

Xi  <G  {0,1}, >  0,  i,j  =  1,.. 

.,n,i<j. 

(42) 

Note  that  in  cases  of  constrained  QBPs,  additional  valid  inequalities  can  be  introduced  via  reformulation- 
linearization  techniques  (RLT)  [138],  which  often  improve  the  performance  of  the  linearization. 
Other  exact  solution  methods  use  approaches  based  on  branch-and-bound  and  cutting  plane  [79, 

80,  115]  techniques. 

A  relatively  less  explored  class  of  methods  is  based  on  the  idea  of  making  copies  of  decision 
variables  and  introducing  additional  constraints  to  ensure  the  identity  between  each  original  decision 
variable  and  its  copies.  The  most  notable  decomposition  method  is  the  one  developed  by  Chardaire 
and  Sutter  [38]  for  unconstrained  QBPs.  Subsequently,  Billionnet  et  al.  [21,  22]  extended  their  work 
to  QBPs  with  knapsack  constraints.  As  a  special  case  of  their  method,  for  each  decision  variable  Xi, 
an  auxiliary  decision  variable  yj  is  introduced  for  each  j  =  1, ...  ,n,j  ^  i.  To  ensure  the  equivalent 
reformulation,  additional  linear  constraints  y\  =  X{  are  enforced  for  all  i.j  =  1  ,...,n,j  ^  i. 
Therefore,  in  the  simplest  case,  their  reformulation  is  given  as: 


In  n  n  ^  j 

(Pi)  :  max  <j  £  c,xt  +  ^  > 

^  i= 1  i= 1  j=l,j^i  J 

(43) 

n 

s.t.  ^2  akiXi  <bk,  k  =  1,  ...  ,171] 

i—  1 

(44) 

y\  =  Xi,  i,j  =  l,...,n,i  /  j] 

(45) 

Xi,yl  G  {0,1},  i,j  =  ^  j. 

With  decision  variables  y ,  each  term  CijXiXj  in  objective  (35)  is  replaced  by  CijXiy }  in  (43).  To  solve 
reformulation  (43)-(46),  Chardaire  and  Sutter  [38]  applied  Lagrangian  relaxation  on  constraints 

(45) .  To  improve  the  Lagrangian  upper  bounding  performance,  the  authors  also  imposed  to  (PI) 
the  following  set  of  nonlinear  constraints 

XiVj  =  xjyj ,  i,  j  =  1,  j.  (46) 

The  authors  showed  that  the  derived  Lagrangian  relaxation  bound  when  relaxing  both  (45)  and 

(46)  is  the  same  as  the  bound  obtained  by  the  LP  relaxation  of  (PSL),  denoted  by  B$l-  To  solve 
the  Lagrangian  dual  problem,  the  authors  proposed  a  partial  enumeration  method  that  fixes  the 
value  of  each  decision  variable  x  and  then  solves  the  resultant  linear  programs  with  respect  to  copy 
variables  y.  Unfortunately,  the  partial  enumeration  method  is  not  computationally  appealing  [124], 
especially  for  constrained  problems.  The  main  reason  for  this  is  that  it  does  not  fully  decompose 
variables  x  and  y  by  relaxing  the  constraint  Xi  =  yj  for  all  i,j  =  1, . . .  ,n,i  /  j. 

In  this  chapter,  we  present  a  new  Lagrangian  decomposition  based  approach,  which  results  in 
an  alternative  Lagrangian  dual  problem  that  can  be  solved  more  efficiently.  In  our  approach,  we 
still  introduce  decision  variable  y)  for  each  pair  i  ^  j,  and  replace  each  c^x^Xj  with  *■ 

in  the  objective  function.  However,  instead  of  using  linear  constraints,  i.e. ,  constraints  (45),  and 
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quadratic  constraints  of  simple  form,  i.e.,  constraints  (46),  we  introduce  parameterized  quadratic 
constraints  of  a  more  complex  form.  We  show  that  we  can  still  establish  the  formulation  equivalence 
with  certain  specification  on  the  parameters  of  the  introduced  quadratic  constraints  even  without 
imposing  linear  constraints  ay  =  yj  for  all  i,  j  =  1, . . . ,  n,  i  ^  j.  More  importantly,  when  we  apply 
Lagrangian  relaxation  on  those  quadratic  constraints,  the  resultant  Lagrangian  dual  problem  can  be 
decomposed  to  n  linear  binary  programs  that  only  involves  decision  variables  y  and  one  linear  binary 
program  that  only  involves  decision  variables  x.  In  addition,  we  provide  properties  to  characterize 
our  Lagrangian  relaxation  bound.  By  carefully  choosing  the  parameters  of  the  introduced  quadratic 
constraints,  we  can  guarantee  that  our  Lagrangian  bounds  are  better  than  B$l •  Our  preliminary 
computational  experiments  demonstrate  the  superiority  of  a  branch-and-bound  algorithm  with  the 
proposed  Lagrangian  relaxation  bound. 

The  remainder  of  this  section  is  organized  as  follows.  In  Section  3.4.2,  we  describe  the  new 
Lagrangian  decomposition  method  that  provides  upper  bounds  for  (PO).  This  subsection  includes 
introduction  and  parameter  specification  of  our  proposed  quadratic  constraints,  derivation  of  de¬ 
composable  Lagrangian  duals,  and  characterization  of  the  derived  upper  bounds.  In  Section  3.4.3, 
we  illustrate  our  method  with  several  classic  QBPs  and  discuss  the  proposed  decomposition  in  more 
detail.  In  Section  3.4.4,  we  offer  some  computational  considerations  on  how  to  integrate  our  bound¬ 
ing  method  into  a  branch-and-bound  (B&B)  framework.  In  Section  3.4.5,  we  report  encouraging 
results  of  our  preliminary  computational  experiments  on  randomly  generated  test  instances  from 
two  classes  of  QBPs  and  compare  our  results  with  solving  two  well-known  linearization  formulations 
directly  via  CPLEX. 

3.4.2  A  Lagrangian  Decomposition  Method 

In  this  section,  we  first  present  our  reformulation  of  (PO).  We  then  show  the  equivalence  between 
(PO)  and  the  reformulation  with  certain  specification  on  the  introduced  quadratic  constraints.  With 
the  proposed  reformulation,  we  finally  introduce  a  new  Lagrangian  decomposition  method  that 
allows  us  to  obtain  a  Lagrangian  relaxation  bound  by  solving  only  linear  binary  programs. 

An  Alternative  Reformulation  of  (PO) 

Consider  the  following  parameterized  quadratic  binary  problem,  which  is  constructed  by  including 
into  (PO)  a  set  of  auxiliary  decision  variables  along  with  additional  quadratic  constraints. 

n  1 

XJ  2CijXiylj 

n 

s.t.  y;  akiXi  <bk,  k=l,...,m;  (48) 

i=  1 

alijxiyj  +  PijXjVi  +  Q\jXi  +  lijXj  >  elj,  i,  j  =  1, . . .  ,n,  i  <  j,  l  =  1, . . . ,  (49) 

Xi,yl  <E  {0,1},  i,j  =  1 

In  (P2),  a  binary  decision  variable  y)  variable  is  introduced  for  each  j  ^  i  to  pair  with  X{.  Hence, 
n  —  1  auxiliary  variables  yj  are  introduced  for  each  ay.  Note  that  we  do  not  introduce  to  (P2)  any 
additional  linear  constraint  that  links  x  and  y.  Instead,  we  introduce  quadratic  constraints  to  link 
them.  We  let  rij  be  the  number  of  quadratic  constraints  for  each  index  pair  (i,  j)  with  i  <  j  and 


(P2)  :  max 

(*,?/)  ^ 


CiXi  + 
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allow  this  number  to  differ  among  index  pairs.  Also  note  that  objective  function  (47)  is  identical  to 
(43). 

Next  we  show  the  equivalence  between  (PI)  and  (P2)  under  certain  parameter  specifications. 
Meanwhile,  it  is  easy  to  see  that  solving  (PI)  yields  an  optimal  solution  to  (PO).  Hence,  we  can 
establish  the  validity  of  solving  (P2).  For  convenience  of  the  exposition,  we  use  shorthand  notation 
(x,  y )  to  denote  (xi, . . . ,  xn,y\, . . . ,  y4 ,  . . . , y”, . . . , y”_1)  in  the  following  results  and  proofs.  We 
denote  A  =  {(x,  y)  £  {0, 1}"  x  {0,  l}n(n_1)  |  (44)  —  (46)}  to  be  the  feasible  solution  region  of  (PI) 
and  B(a,f3,6, 7,  e)  =  {(x,  y)  £  {0,  l}n  x  {0,  |  (48)  —  (50)}  to  be  the  feasible  solution  region 

of  (P2)  parameterized  on  a  :=  ( 07,),/?  :=  (/ %j),0  :=  (dy), 7  :=  (7^),  and  e  =  (e^).  For  notational 
simplicity,  we  use  B  instead  of  B(a,/3,0, 7,  e)  when  referring  to  a  parameterized  feasible  solution 
region  of  (P2).  The  parameters  are  always  specified  when  making  such  a  reference.  We  also  let 
Cl  =  {(1,1, 1,1), (1,0, 1,0), (0,1, 0,1), 

(0, 0, 0, 0)},  C2  =  {(1,0,  0, 0),  (0, 1,0,  0),  (0,  0, 0, 1),  (0, 0, 1,  0),  (0, 0, 1, 1)},  and  C3  =  {0,  l}4\(Ci  U 
C2).  To  ensure  the  equivalence  between  (PI)  and  (P2),  we  essentially  need  to  show  that  A  C  B  and 
no  feasible  solution  to  (P2)  is  from  C3.  Therefore,  the  parameters  have  to  be  specified  accordingly. 

Proposition  1  If  for  all  (i,j)>  *,  j  =  1,  •  •  •  ,n  and  i  <  j,  the  following  conditions  hold: 


aij  +  Pij  +  Q\j  +  lij 

> 

eb'> 

(50) 

> 

eb'> 

(51) 

lij 

> 

eij ’ 

(52) 

eij 

< 

0, 

(53) 

for  all  l  =  1, ,  rij,  then  A  C  B. 

Proof:  Given  a  (x,y)  £  A,  it  is  easy  to  see  that  (xj,  Xj,  yj,  y})  £  C\  for  every  index  pair  (■ i,j ). 
Hence,  a  sufficient  condition  for  A  C  B  is  to  ensure  that  all  four  vectors  in  C\  satisfy  constraint  (49) 
when  conditions  (50)-(53)  hold  for  all  1  =  1,...  ,rij.  We  consider  four  cases  to  verify  the  sufficient 
condition  and  summarize  the  verification  in  the  following  table: 


(xi,  Xj,yj ,  ylf) 

Constraint  (49) 

Ensured  by  condition 

(M,  1,1) 

aii  +  Pii  +  9la  +  7 h  >  eii 

(50) 

(1,0, 1,0) 

0l.  >  e-  ' 

*.?  —  2.7 

(51) 

(0,1, 0,1) 

'y  -  ■  >  e1 •  • 

1  if\  —  i(\ 

(52) 

(0,0, 0,0) 

e  -  <  0 

1^  — 

(53) 

For  example,  if  (xj,  xj ,  yj ,  y ® )  =  (1, 1, 1, 1),  each  constraint  (49)  is  reduced  to  a\rj  +  fjl  +  0^ + 7^  >  e\j, 
which  coincides  with  condition  (50).  The  proposition  then  follows  as  any  solution  (x,y)  in  A  is  also 
in  B.  □ 

Proposition  1  implies  that  with  parameter  specification  as  in  (50)  -  (53),  we  ensure  that  any 
combination  in  C\  must  satisfy  all  corresponding  constraints  (49)  for  every  index  pair. 

Proposition  2  Consider  any  (x, y)  £  B.  If  (xj,  Xj,  y(,  y})  £  C\  U  C2  for  each  ( i,j ),  i,j  =  1, . . .  ,n 
and  i  <  j,  then  there  exists  (x,  y)  £  A  such  that  (x,  y)  in  (P2)  and  (x,  y)  in  (PI)  yield  the  same 
objective  function  value. 

Proof:  Given  a  (x,y)  £  B,  we  consider  two  cases  to  construct  (x,y).  For  each  index  pair  (■ i,j ), 
1)  if  (x, .  Xj .  /// . .//} )  £  Ci,  we  simply  let  (xi,xj,yj,ylj)  =  (xh  xj,  y\,  y*);  2)  if  (xi,xj,y3i,y(j)  £  C2,  we 
construct  (xt,  Xj,  yj,  y1-)  in  the  following  manner: 
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(xi,  Xj,yj ,  y)) 

(xi>  xj,yl ,  y)) 

(1,0, 0,0) 

(1,0, 1,0) 

(0,1, 0,0) 

(0,1, 0,1) 

(0,0, 0,1) 

(0,0, 0,0) 

(0,0, 1,0) 

(0,0, 0,0) 

(0,0, 1,1) 

(0,0, 0,0) 

It  is  easy  to  see  that  (x,  y)  G  A  with  the  above  construction.  Regarding  the  objective  functions,  it 
is  clear  that  for  any  pair  c^xyy1-  +  CjiXjyj  =  CijXiy *■  +  CjiXjy\  given  ( x,y )  €  B  and  ( x,y )  G  A. 

The  proposition  then  follows  as  the  two  objective  function  values  are  equal.  □ 

Proposition  2  implies  that  although  A  C  B  allows  more  feasible  solutions  to  (P2)  than  (PI),  such 
feasible  solution  region  expansion  does  not  affect  the  optimality  equivalence  between  (PI)  and  (P2) 
as  long  as  for  all  (i,  j),  1  <  i  <  j  <  n,  (xi,Xj,yj,yj)  G  C\  U  67  •  Next,  we  show  certain  expansion 
should  not  be  allowed  as  it  may  affect  the  optimality  equivalence  between  (PI)  and  (P2).  Thus,  we 
should  prohibit  such  expansion  by  specifying  parameters  in  constraint  (49).  In  the  following,  we 
provide  a  sufficient  condition  to  ensure  this. 

Proposition  3  Suppose  for  some  (s,t),  s,t  =  1 ,n  and  s  <  t,  there  exists  an  index  set  Lst  := 
(h,  h,h,  h,  h)  G  {1, . . . ,  rst }5  such  that  the  following  conditions  hold: 


+  ®lst  +  7  ls\ 

< 

Jl. 

tst^ 

(54) 

< 

J2  . 
tst’ 

(55) 

0ls3t  +  7  is 

< 

M . 

tst > 

(56) 

< 

M. 

tstl 

(57) 

+  7  is 

< 

J S 

esf 

(58) 

For  any  (x,y),  if  {xs,  xt,  yfs,  yf)  G  C3,  then  (x,y)  £  B. 

Proof:  Given  (s,  t)  with  (xs.  a^,  y*,  yf)  G  C3,  we  check  in  the  following  table  whether  such  (xs,xt,  y^yf 
violates  constraint  (49)  with  a  set  of  parameters  specified  in  (54)-(58)  and  {h,l2,h,h,h)- 


(■ Xi,Xj,yj,y *■) 

Constraint  (49) 

Violated  by  condition 

(1,1, 0,1) 

ast  +  0s\  +  1st  ^  4t 

(54) 

(1,1, 1,0) 

(55) 

(1,1, 0,0) 

0ls\  +  l[ft>el!t 

(56) 

(1,0, 0,1) 

<*%  +  7^  >4 

(57) 

(1,0, 1,1) 

(57) 

(0,1, 1,0) 

MJ 

Al 

"cc 

+ 
-i?  to 

(58) 

(0,1, 1,1) 

PH  +  1st  >  45t 

(58) 

Note  that  the  seven  cases  of  (xs,  xt,  yfs,  yf)  in  the  above  table  show  all  possible  combinations  in 
(7,3 .  The  violation  checking  in  the  table  indicates  that  (xs,  xt,  yl,  yf)  cannot  satisfy  (49).  Hence, 
{x,y)£B.  □ 

Note  that  it  is  allowed  in  Proposition  3  that  ls  =  It  for  1  <  s  ^  t  <  5  and  the  choice  of  Lst  may 
not  be  unique  for  a  given  index  pair  (s,f).  Proposition  3  implies  that  B  should  not  contain  any 
(x,y)  with  component  as  any  of  the  seven  combinations  in  C3,  since  such  a  solution  may  destroy 
the  optimality  equivalence  between  (PI)  and  (P2).  With  the  three  propositions  presented  earlier, 
we  readily  state  one  sufficient  condition  to  ensure  (P2)  is  a  valid  reformulation  of  (PO). 
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Theorem  1  The  two  formulations  (PI)  and  (P2)  are  equivalent  in  the  sense  that  they  yield  the 
same  optimal  objective  function  value  and  the  same  optimal  solution  with  respect  to  x  if  there  exists 
(a,  /3,  9, 7,  e),  such  that  for  any  index  pairs  ( i,j ),  i,j  =  l,...,n  and  i  <  j,  the  following  conditions 
hold: 

1.  for  all  l  =  1, ,  rtj, 


aij  +  P\j  +  &lij  +  l\j 

IV 

rb 

(59) 

"TlT 

Al 

(60) 

lij 

IV 

rb 

(61) 

eij 

<  0. 

(62) 

there  exists  an  index  set  L^,  i.e.  (l^  ,1^  ,1^  ,1^  ,1^)  such  that 

lij  lij  lij 

a1  +9-1  +71  < 

ij  1  ij  '  hj  ^ 

lij 

e  1  • 

cij  > 

(63) 

lij  lij  lij 

+  i2  < 

( -  ■ 

ij  > 

(64) 

lij  lij 

6  3  +  7  3  < 

IJ  1  hj  ^ 

e^. 

(65) 

jij  lij 

< 

lij 

4; 

(66) 

jij  lij 

4s +i5  < 

S. 
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(67) 

Proof:  Let  /*  and  /|  be  the  optimal  objective  function  values  of  (PI)  and  (P2),  respectively.  From 
Proposition  1,  we  have  A  C  £>,  and  thus  /*  <  /|.  Propositions  2  and  3  show  that  for  any  (x,  y)  £  B, 
there  exists  (x,y)  £  A  that  yields  the  same  objective  function  value  with  (x,y),  which  implies 
that  /|  <  /*  and  thus  the  result  follows.  Furthermore,  given  any  (x,  y)  £  £>,  the  construction  of 
(x,  y)  £  A  in  Proposition  2  ensures  that  the  solutions  are  always  identical  with  respect  to  x.  which 
completes  the  proof.  □ 

Corollary  1  The  formidation  (P2)  is  a  valid  reformulation  of  (PO)  with  parameter  specification 
given  in  Theorem  1. 

Theorem  1  and  Corollary  1  establish  the  fact  that  with  appropriate  parameter  specification  in 
constraints  (49),  (P2)  is  a  valid  reformulation  of  (PO).  Thus  one  may  solve  (P2)  instead  of  (PO). 
Two  issues  remain  for  the  algorithmic  development.  One  issue  is  how  to  solve  (P2)  efficiently.  The 
other  is  how  to  specify  the  parameters  in  constraints  (49)  so  that  we  ensure  not  only  the  formulation 
equivalence  but  also  the  solution  efficiency  in  practice.  In  Section  3.2.2,  we  develop  an  Lagrangian 
decomposition  method  to  improve  the  efficiency  of  solving  (P2).  In  Section  3.2.3,  we  propose  some 
practical  specifications  of  the  parameters  in  constraints  (49)  and  discuss  their  resultant  Lagrangian 
relaxation  bounds. 

Lagrangian  Decomposition  in  (P2) 

In  this  section,  we  present  a  Lagrangian  decomposition  method  to  solve  the  reformulation  (P2).  We 
assume  that  the  parameters  in  constraints  (49)  have  been  specified  to  ensure  the  validity  of  (P2). 
For  each  pair  1  <  i  <  j  <  n  and  each  l  =  1 , . . . ,  r^ ,  we  associate  a  Lagrangian  multiplier  with 
the  corresponding  constraint  (49).  Then  the  Lagrangian  dual  problem  of  (P2),  denoted  by  L( A),  is 

(68) 


L{ A)  =  m&x{L(\,x,y)  |  (48),  (50)}, 

x,y 
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where  L( A,  x,  y )  =  f(x ,  y)  +  X"=i  Ei>*  Ej=i  Xij(aijxiVj  +  PijxjVi  +  dljxi  +  - 


eb '  * 


n  /  rji  rij  rji 

X  r;  1  XX! 44  XX! 44 + X(ev' + X 44)4 1  X(7r;'  +  X 444 


i= 1 


j>i  1=1 


j<i 1=1 


j>i 


1=1 


j<i 


1=1 

n 


and  the  optimal  Lagrangian  dual  of  (P2),  denoted  by  is: 

Bld  =  minL(A). 

A>0 


XXX  Ab'4’(69) 

i= 1  j>i  1=1 


(70) 


Lemma  1  Consider  a  function  f(x,y)  with  form  f(x,y)  =  g(y)h(x )  with  x  £  X  and  y  €  V.  If 
h(x)  >  0  for  all  x  G  X,  then  maxxeX,yeY  f(x,  y)  =  ma xxeX,yeY  g{y)h(x)  =  maxx6x{(maxj,ey  g(y))h(x)}. 

With  Lemma  1,  we  can  further  derive  L( A)  as: 


L(A)  =  maxL(A,  x,  y)  =  max 

#,2/  y 


X 4(4)4  -XXX 4< 


.  2=1 


2=1  j>i  1=1 


=  max 


71  ( 

X  ( rnax.%(4’ 


*=i  v  2/1 


x4-XXXA: 

J  i= 1  j>i  1=1 


ij  i 


(71) 


where 


5i(y*)  =  Ci  +  X  X  44  +  X  X  44  +  X4Cij  +  X  44)^ j 


j>i 1=1 


j<i  1=1 


J>1 


1=1 

rji 


+X4c*j' +  X  44)4 


(72) 


j<i 


1=1 


for  i  =  1, _ ,  n.  Note  that  we  drop  the  constraints  in  (71)  for  simplicity  of  the  exposition.  Also 

note  that  the  mutual  independence  among  gi{yl)  along  with  Lemma  1  ensure  the  second  equality 
in  (71).  Hence,  we  can  decompose  L( A)  to  n  unconstrained  linear  binary  problems  and  one  linear 
binary  program.  For  each  i  =  1 , . . .  ,n,  let 


g*  =  max  gt  (yl), 
yl 


(73) 


which  is  an  unconstrained  linear  binary  program  only  involving  yl.  Then  L( A)  is  further  derived  as: 

(48),  (50)1,  (74) 


L(A)  =  max  \  X  9*xi  ~  X  X  X  44 

i= 1  j>i  1=1 


i= 1 


which  is  a  linear  binary  program  only  involving  x.  We  call  (74)  the  master  subproblem  after 
decomposition  (MS AD). 

By  applying  Lemma  1,  we  can  compute  L( A)  by  solving  n  unconstrained  linear  binary  problems, 
each  of  which  corresponds  to  yl,  i  =  1, . . .  ,  n,  and  one  linear  binary  program,  which  corresponds 
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to  x.  It  is  easy  to  solve  each  unconstrained  integer  program  (IP)  ma xyi  gfiy1 )  with  n  —  1  binary 
variables.  Thus,  generally  speaking,  the  complexity  of  solving  L( A)  is  determined  by  the  complexity 
of  solving  MS  AD.  In  addition,  we  note  that  MS  AD  preserves  the  feasible  solution  region  of  the 
original  problem  (PO)  whereas  linearization  methods  introduce  additional  constraints  to  (PO)  (e.g., 
[64,  68,  109]),  which  may  significantly  complicate  the  original  problem  structure.  With  the  feature 
of  structure  preserving,  our  method  is  expected  to  be  efficient,  especially  for  problems  with  special 
structures.  Furthermore,  it  is  expected  to  be  beneficial  when  embedding  our  method  within  a 
branch-and-bound  framework,  as  indicated  in  the  following  remark. 

Remark  4  For  any  given  A  >  0,  solving  MS  AD  provides  a  feasible  solution  to  (PO)  while  L(  A) 
offers  an  upper  bound  to  (PO). 

In  a  Lagrangian  decomposition  based  branch-and-bound  framework,  it  may  be  computationally 
useful  to  terminate  the  solution  of  a  Lagrangian  relaxation  problem  before  it  reaches  the  optimality. 
In  our  case,  we  terminate  a  subgraident  algorithm  for  solving  problem  (70)  prematurely  and  use 
the  obtained  Lagrangian  dual  as  the  upper  bound  instead  of  Bld ■  Note  that  we  also  obtain  a 
potentially  promising  feasible  solution  at  the  same  time  with  the  upper  bound.  In  section  3.4.4, 
We  state  computational  considerations  on  solving  the  Lagrangian  duals  via  the  subgradient  method 
and  discuss  their  effects  in  a  branch-and-bound  framework. 

Parameter  Specification  in  Quadratic  Constraints  (49) 

To  ensure  the  equivalence  of  (PO)  and  (P2),  we  need  to  specify  parameters  in  the  set  of  quadratic 
constraints  (49).  Theorem  1  provides  a  sufficient  condition  on  the  parameter  specification.  However, 
that  sufficient  condition  leads  to  infinitely  many  feasible  parameter  specifications.  In  this  section, 
we  assume  that  a  constant  number  of  quadratic  constraints  for  each  index  pair,  i.e.,  rij  =  r  for 
all  1  <  i  <  j  <  n  and  discuss  only  some  particular  subsets  of  the  parameter  space.  We  consider 
r  =  1,2,6  and  identify  some  characterizations  of  the  obtained  quadratic  constraints. 

Case  r  =  1 

As  indicated  in  Propositions  1  and  3,  a  necessary  condition  for  valid  specification  of  (a,  (3,  0,7,  e) 
is  that  for  any  index  pair  (i.  j)  with  1  <  i  <  j  <  n,  the  corresponding  quadratic  constraint  (49), 
OLijXiy )  +  PijXjyl  +  dijXi  P'yijXj  >  eij,  is  satisfied  by  (. xi,xj,yl,ylj )  =  (1,1, 1,1)  and  (0,  0,0,  0)  but 
violated  by  (xj,  Xj ,  yj .  ]/■ )  =  (0, 1, 1, 1)  and  (1, 0, 0, 1).  Therefore,  the  parameters  (ccjj,  fiij,  Oij ,  7^,  e^) 
must  satisfy  the  following  conditions: 


aij  +  fiij  +  @ij  +  7 ij 

> 

eij'i 

eij 

< 

0; 

fiij  +  7 ij 

< 

eij'i 

Olij  4“  0{j 

< 

eij  1 

for  1  <  i  <  j  <  n.  It  is  easy  to  see  that  the  above  four  conditions  can  not  be  satisfied  simultaneously, 
which  leads  to  the  following  remark. 

Remark  5  A  necessary  condition  to  ensure  the  equivalence  between  (P0)  and  (P2)  is  rij  >  2  for 
any  index  pair  ( i,j )  with  1  <  i  <  j  <  n,  in  the  parameter  specification  of  quadratic  constraints  (49). 
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Case  r  =  2 


In  the  case  r  =  2,  there  are  two  quadratic  constraints  (49)  for  each  index  pair,  which  requires  us 
to  specify  twice  as  many  parameters  as  in  the  case  r  =  1.  This  gives  us  more  freedom  to  specify 
the  parameters  that  satisfy  the  conditions  in  Theorem  1.  Namely,  we  must  find  a  set  of  parameters 
such  that  (59)-(62)  and  (63)-(67)  hold  simultaneously  for  each  index  pair.  We  next  present  a  valid 
way  to  specify  the  parameters  to  ensure  the  equivalence  between  (PI)  and  (P2). 

For  each  index  pair  (i,j),  1  <  i  <  j  <  n,  let  atJ  :=  ajj  =  /??-,  /%  :=  a?-  =  (3^ ,  0^  :=  Ojj  =  7^, 
7 ij  :=  0‘f3  =  7?1-  and  £7  :=  e|-  =  efy  Hence,  conditions  (59)  -  (62)  can  be  further  derived  as: 

&ij  fiij  T  @ij  T  7 ij  —  (75) 

Oij  >  €ij ;  (76) 

lij  —  (77) 

67  <  0,  (78) 

for  each  («,  j),  1  <  i  <  j  <  n.  Furthermore,  we  set  l\  =  2,  Z2  =  1,  (3  =  1, 14  =  2, 1$  =  1  for  each  (i,  j), 
1  <  i  <  j  <  n.  Hence,  conditions  (63)  -  (67)  can  be  further  derived  as: 

Pij  +  @ij  +  lij  <  eiji  (79) 

Oij  T  7 ij  N  ,  (80) 

fiij  T  77  ^  ^7,  (81) 

for  each  (i,j),  1  <  i  <  j  <  n.  Note  that  given  the  above  specification  on  (a,/3,6, 7,  e),  with  l\  =  2 
and  1-2  =  1,  conditions  (63)  and  (64)  are  identical  and  become  (79);  and  with  I4  =  2  and  Z5  =  1, 
conditions  (66)  and  (67)  are  identical  and  become  (81).  Finally,  in  order  to  find  a  valid  set  of 
parameters,  (78)  has  to  be  tightened  to  67  <  0.  This  is  because  (76),  (77),  and  (80)  together,  imply 
eij  >  2 €ij.  Hence,  we  replace  (78)  with 

eij  <  0,  (82) 

for  each  (i,j),  1  <  *  <  j  <  n,  in  the  following  discussion  on  the  case  r  =  2. 

Remark  6  Given  the  selection  on  as  above,  for  each  index  pair  (■ i,j ),  1  <  i  <  j  <  n, 

any  solution  (07,  ^7,  ^7, 77, 67)  that  satisfies  the  set  of  inequalities  (75)-(77),  (79)-(81),  and  (82), 
is  valid  in  terms  of  parameter  specification.  Piecing  satisfactory  solutions  together  for  all  index  pairs 
ensures  the  equivalence  between  (PI)  and  (P2). 

Remark  7  In  the  case  r  =  2,  once  conditions  (75)-(77)  and  (82)  are  satisfied,  one  can  find  a 
valid  set  of  parameters  to  ensure  the  equivalence  (PI)  and  (P2)  as  long  as  l\  I2  and  I4  /  Z5  when 
selecting  (h,l2,h,h,h)- 

The  above  two  remarks  imply  that  there  are  still  a  large  number  of  valid  specifications  to  ensure 
the  equivalence  between  (PI)  and  (P2).  In  the  following,  we  fix  the  selection  on  (Ii,l2,h,h,l5)  as 
used  earlier  for  conditions  (79)— (81).  However,  we  further  simply  the  selection  on  (a,  (3,6, 7,  e)  with 
the  goal  of  improving  computational  efficiency  in  solving  the  Lagrangian  relaxation  problem  (70). 
To  solve  (70),  we  apply  the  subgradient  method  in  Held  and  Wolfe  [78],  with  which  Lagrangian 
multipliers  A  are  updated  at  every  iteration  in  the  following  fashion.  For  each  (i,j)  with  1  <  i  < 
j  <  n  and  1  =  1,2, 

(X[j)k+1  =  max{0,  (A \j)k  +  s(e\3  -  a^y)  -  ■i{ixjy!  -  d^a 7  -  77^)}, 
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(83) 


where  k  is  the  iteration  index  and  s  is  the  step  size.  In  general,  it  is  desirable  to  update  the 
Lagrangian  multipliers  only  when  (xt,Xj.  yj,  y1-)  is  infeasible  to  (P2).  In  other  words,  the  fact  that 

(. Xi,Xj,yf,yj )  satisfies  (49),  suggests  that  (A L)fc  is  likely  to  be  sufficiently  good  and  thus  there  is 
no  need  to  further  modify  it.  Therefore,  conditions  (75)— (77)  should  be  satisfied  with  equality 
whenever  it  is  possible  ,  which  leads  to  the  following.  For  each  (i,j),  1  <  i  <  j  <  n  and  l  =  1,2, 
we  set  Q\j  =  7?-  =  ej-  =  e  and  =  —  (a*-  +  e*-).  Note  that  with  the  above  partial  parameter 
specifications,  we  still  need  to  set  «L .  Without  loss  of  generality,  we  arbitrarily  let  a|-  =  /3fj  =  1 
and  afj  =  fi\3  =  — 1  —  e  for  all  For  each  1  <  i  <  j  <  n,  we  thus  have 

xi(ylj  +  e)  +  xj((-l-e)yl+e)  >  e;  (84) 

**((-!  -  e)Vj  +  e)  +Xj(yl  +  e)  >  e.  (85) 

Finally  to  satisfy  (81)  and  (82),  we  must  have  —1  <  e  <  0.  It  is  easy  to  check  the  above  parameter 
specification  satisfies  the  conditions  in  Theorem  1  and  thus  ensure  the  equivalence  between  (PI) 
and  (P2)  with  (84)  and  (85)  in  place  of  (49)  for  each  (i,  j).  1  <  i  <  j  <  n.  Given  its  simplicity,  we 
use  the  above  specification  in  all  our  computational  experiments.  Our  preliminary  computational 
results  show  that  although  it  is  likely  that  we  cannot  obtain  a  Lagrangian  relaxation  bound  as  good 
as  Bsli  the  bound  obtained  by  solving  the  LP  relaxation  of  (PSL),  the  computational  benefit  due 
to  Lagrangian  decomposition  offsets  the  bounding  inferiority.  However,  the  inferiority  motivates  us 
to  study  cases  where  r  >  2. 

Case  r  =  6 

As  r  increases,  it  is  less  a  concern  for  identifying  a  set  of  parameters  (a,  /3,  6 , 7,  e)  that  satisfies  the 
sufficient  condition  in  Theorem  1  and  thus  ensures  the  equivalence  of  (PI)  and  (P2).  In  the  case 
r  =  2,  we  explore  how  to  improve  the  computational  efficiency  in  terms  of  solving  the  proposed  La¬ 
grangian  relaxation  problem.  In  this  section,  we  intend  to  investigate  good  parameter  specifications 
such  that  the  the  proposed  Lagrangian  relaxation  bound  can  outperform  Bsl- 

We  present  a  special  case  of  r  =  6,  with  which  we  show  that  one  can  generate  an  upper  bound 
on  (P0)  at  least  as  tight  as  Bsl-  Let  us  consider  the  following  specification  of  the  parameters.  For 
all  index  pairs  (i,  j)  with  1  <  i  <  j  <  n,  we  set 

ah  =  Plj  =  °>  9h  =  -1’  ih  =  eh  =  -1;  (86) 

<4  =  0,  pfj  =  1,  4  =  -1,  t§  =  -1,  4  =  -1;  (87) 

a%  =  -1,  4  =  0,  0%  =  0,  4-  =  1,  4  =  0;  (88) 

4  =  °>  =  -1,4  =  1,  4  =  0,  e%  =  0;  (89) 

4  =  4  =  _1>  efj  =  °’  4  =  °>  4  =  0;  (90^ 

4  =  -1-  4  =  h  4  =  °>  4  =  °’  4  =  °-  (91) 

It  is  clear  that  (86)-(91)  satisfy  the  sufficient  condition  in  Theorem  1.  To  be  specific,  for  all  (i.  j) 
with  1  <  i  <  j  <  n,  we  have  a)  (86)-(91)  all  satisfy  conditions  (59)— (62) ;  and  b)  (87)  and  (91) 
satisfy  (63),  (86)  and  (90)  satisfy  (64),  (86)  and  (87)  satisfy  (65),  (88)  and  (91)  satisfy  (66),  and  (89) 
and  (90)  satisfy  (67).  Therefore,  one  can  select  h,  i  =  1, . . . ,  5,  accordingly.  For  example,  by  setting 
=  (2, 1,2,  3, 4),  the  condition  in  Theorem  1  is  satisfied  and  thus  the  equivalence 
between  (PI)  and  (P2)  is  established.  Note  that  our  selection  of  Zj,  i  =  1, . . . ,  5,  indicates  that  it  is 

sufficient  to  have  the  first  four  sets  of  parameters.  It  will  be  clear  later  in  this  section  that  the  last 

two  sets  of  parameters  ensure  the  superiority  of  the  derived  Lagrangian  relaxation  bound. 
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With  the  parameters  specified  in  (86) — (91),  we  rewrite  quadratic  constraints  (49)  to  the  following 
five  constraints: 


Xiy)>Xi  +  Xj- 1;  (92) 

Xjyl  >  Xi  +  Xj  -  1;  (93) 

Xi  >  Xjyl ;  (94) 

xj  >  Xi.y) ;  (95) 

XiVj  =  Xjyl,  (96) 


for  each  (i,  j),  1  <  i  <  j  <  n.  Note  that  (90)  yields  Xj/y)  >  Xjy\  and  (91)  yields  x%y(  <  Xjyl,  we 
thus  combine  them  and  form  the  equality  in  (96). 

By  replacing  Zij  =  Xiy *■  =  Xjyl  for  each  pair  1  <  i  <  j  <  n,  it  is  clear  to  see  the  simi¬ 

larity  between  constraints  (92)-(96)  and  constraints  (39)-(41).  We  next  show  that  the  Lagrangian 
relaxation  on  constraints  (92)-(96),  in  fact,  achieves  an  upper  bound  as  least  as  tight  as  Bsl- 


Theorem  2  The  upper  bound  obtained  via  the  Lagrangian  relaxation  of  (P2)  on  constraints  (92)- 
(96)  is  at  least  as  tight  as  the  upper  bound  obtained  via  the  LP  relaxation  of  (PSL),  i.e.  Bld  <  Bsl- 


Proof:  It  is  clear  that  computing  the  optimal  Lagrangian  dual  of  (PSL)  by  relaxing  (39)-(41) 
yields  an  upper  bound  at  least  as  tight  as  the  one  obtained  by  solving  the  LP  relaxation  of  (PSL) 
[57,  63].  Therefore,  it  suffices  to  prove  that  computing  the  optimal  Lagrangian  dual  of  (P2)  by 
relaxing  (92)-(96)  yields  an  upper  bound  at  least  as  tight  as  the  one  obtained  by  computing  the 
optimal  Lagrangian  dual  of  (PSL). 

For  each  index  pair  1  <  i  <  j  <  n,  let  us  denote  the  Lagrangian  multipliers  associated 

with  (39)  to  be  ( \j ,  and  denote  those  associated  with  (40)  and  (41)  to  be  and  Tji,  respectively.  We 
use  the  short-hand  notation  £  and  r  to  represent  the  vectors  containing  all  Lagrangian  multipliers 
Qj  and  r^.  We  then  present  the  optimal  Lagrangian  dual  of  (PSL)  as 

GLD  =  G(C,T*)  =  mmG((,T)  (97) 

C;r 

where 

{n  n 

yfCl  +  C ij)Xi  +  EE  {°ij  +  Cij  -  Tij  ~  Tji) Zij 

1=1  j^i  j<i  j>i  i= 1  j>i 


+  HX!  Cij 

i=  1  j>i 


(38),  (42) 


(98) 


Next  let  us  denote  the  Lagrangian  multipliers  associated  with  (92)-(96)  with  i'ij,L'ji,Sij,Sji,  and 
Kij,  for  each  (i,j)  with  1  <  i  <  j  <  n.  Note  that  vt].  Oji,  8VJ,  SyL  must  be  non-negative  whereas  Kij 
can  be  any  real  number.  We  then  present  the  optimal  Lagrangian  dual  of  (P2)  as 


Bld  =  L(u*,5* ,  k*)  =  rriin  L(u.  6 ,  n), 

V,S,K 


(99) 


where 


L(n,  6 ,  k)  =  max 
x,y 


(  c*  'y^J(L/L  +  vji)  ^2  ^2  (  9Cb  Uj/J  S"  9jx 


i=l  \  j^i  j^i 


i= 1  j>i 
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(100) 


+  22  22  f  2 cii  K^J  yixi  22  22^l>ii  "*~  *b*) 

2=1  j>i  '  '  2=1  j>i 


(48),  (50) 


With  the  notation  introduced  above,  we  have  Gld  <  B$l-  Hence,  we  need  to  prove  Bld  <  Gld 
to  complete  the  proof  of  the  theorem. 

Next  we  show  for  any  feasible  (£,  r),  there  exists  a  feasible  (v,  5,  k)  such  that  L(u,  5 ,  k)  <  G(C,  r). 
If  this  is  true,  for  (C*,t*),  the  multipliers  associated  with  the  optimal  Lagrangian  dual  of  (PSL), 
there  must  exist  some  feasible  (0,5,  k)  such  that  L(0,5,k )  <  G(C*,t*)  which  implies  that  Bld  < 
L(0,6,k)<G{C,t*)  =  Gld. 

Given  any  feasible  (£,  r),  we  let  5{j  =  r^,  for  1  <  i  7^  j  <  n,  and  Vij  +  Vji  =  Cij,  for  1  <  i  <  j  <  n. 
It  is  clear  with  the  above  assignments  that 


ci + y  ^ $ij  y  4~ — ci 4“ 22  Bj  y  ^ o*  ^  ^ (101) 

jV*  j+i  j+i  j<i 

for  i  =  1, . . . ,  n,  and 

n  n 

2222^Uii  ,yb)  =  ^b-  (102) 

2—  1  j  >2  2—  1  J  >2 

Let  z*  be  an  optimal  solution  to  (98)  with  a  feasible  (£,  r).  Since  binary  variable  z  is  unrestricted 
in  (98),  we  know  for  each  (i,j),  1  <  i  <  j  <  n,  z*-  =  1  if  ctj  +  Cij  —  Tij  —  Tji  >  0  and  z*rj  =  0, 
otherwise.  We  also  let  y*  be  an  optimal  solution  to  (100). 

For  any  (s,t),  1  <  s  <  t  <  n,  we  further  write  cst  +  Cst  -  rst  -  Tts  =  ( \cst  +  vst  -  5ts  +  nst)  + 

(2ts  +  vts  —  5 st  —  nst).  Here  we  use  the  fact  that  c^-  =  Cji  for  all  ( i,j )  with  1  <  i  <  j  <  n.  We 

consider  three  cases  on  cst  4-  Cst  ~  Lst  —  Tts  in  (98). 

Case  I:  cst  +  Cst  ~  Tst  —  Tts  >  0.  This  condition  implies  that  z*t  =  1.  In  this  case,  we  can  find 
vst,  vts  >  0  and  nst  G  R  such  that  \cst  +  vst  -  5ts  +  Kst  >  0  and  \cts  4-  vts  -  5st  -  Kst  >  0.  Hence, 
we  conclude  that  (yf)*  =  1  and  (yl)*  =  1,  which  implies  that 


(l 

(  2  ^st  Vst 


5ts  4-  tist  )  (y?)*x*s  4- 


1 


r  tys  “I-  Vts 


(: v\ 


t\*  r 


^  (Cst  Cst  r8t  Tts^Zsti 


(103) 


since  x  is  a  binary  variable. 

Case  II:  cst  4-  Cst  ~  Tst  —  Tts  =  0.  Similar  to  Case  I,  it  is  clear  that  we  can  find  ust,  vts  >  0  and 
nst  G  R  such  that  \cst  +  vst  —  5ts  4-  nst.  =  0  and  \ cts  +  vts  —  5st  —  Kst  =  0  and  thus  (103)  holds 
trivially  with  both  sides  being  0. 

Case  III:  cst  +  Cst  —  ts t  —  Tts  <  0,  which  implies  z*st  =  0.  It  is  clear  that  we  can  find  vst,  vts  >  0 
and  Kst  G  R  such  that  ^ cst  4-  vst  —  5ts  +  Kst  <  0  and  \cts  4-  vts  —  5st  —  Kst  <  0.  We  conclude  that 
(yf)*(xs)*  =  0  and  (yts)*(xt)*  =  0.  Hence,  (103)  follows. 

We  have  proven  for  any  feasible  (£,  r),  there  exists  some  (0,5,0)  such  that  (103)  holds  for  any 
(i,  j)  with  1  <  i  <  j  <  n.  Together  with  (101)  and  (102),  we  complete  the  proof.  □ 

Remark  8  As  a  matter  of  fact,  only  one  of  (92)  and  (93)  is  needed  to  ensure  the  residts  above. 
To  see  this,  without  loss  of  generality,  we  keep  (92)  and  discard  (93).  This  is  equivalent  to  letting 
uij  =  Cij  and  Vji  =  0  for  all  index  pairs  ( i,j )  and  the  above  proof  is  still  valid.  Hence,  we  can 
strengthen  Theorem  2  with  r  =  5  (i.e.,  with  only  four  quadratic  constraints  for  each  index  pair). 
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Remark  9  If  we  keep  constraints  (92)-(96)  and  add  more  constraints  of  form  (49)  into  (P2),  we 
can  always  improve  the  tightness  of  the  resultant  Lagrangian  relaxation  upper  bound.  For  example, 
we  may  consider  a  case  of  r  =  8  by  adding  (92)-(96)  together  with  (84)-(85)  into  (P2),  which 
yields  an  upper  bound  at  least  as  tight  as  only  relaxing  (92)-(96)  or  (84)~(85).  This  indicates  that 
Lagrangian  relaxation  on  constraints  (49)  gives  us  flexibility  in  the  upper  bound  improvement,  which 
is  potentially  significant  in  a  Lagrangian  relaxation  based  branch- and-bound  framework. 

When  we  add  (92)-(96)  to  general  constrained  QBPs,  it  is  typical  that  no  fast  solution  exists  to 
the  MSAD  (i.e. ,  problem  (74))  that  is  induced  by  relaxing  (92)-(96).  We  thus  consider  solving  the 
LP  relaxation  of  MSAD.  In  the  following,  we  prove  that  such  an  LP  relaxation  can  still  produce  an 
upper  bound  at  least  as  tight  as  Bsl- 

Corollary  2  We  consider  an  MSAD  described  as  above.  Let  B^  be  the  LP  relaxation  bound  of 
this  MSAD.  Such  an  upper  bound  is  at  least  as  tight  as  the  LP  relaxation  of  (PSL),  i.e.,  B(J)  <  Bsl- 

Proof:  Let  be  the  optimal  objective  value  of  the  LP  relaxation  of  (97).  Relaxing  the  integrality 
restrictions  on  x  in  (97)  yields  an  unconstrained  problem  with  linear  objective  function.  Thus, 
GLl g  =  Bsl-  To  prove  B%  <  G we  can  use  the  same  Lagrangian  multipliers  as  in  the  proof  of 
Theorem  2.  Hence,  we  conclude  B ^  <  G ^  =  Bsl-  n 

3.4.3  Structure  Preserving  Decomposition 

As  we  discuss  in  Section  3.2.2,  in  order  to  solve  Lagrangian  dual  L( A)  for  given  multipliers  A,  i.e., 
problem  (68),  one  can  decompose  the  problem  into  n  unconstrained  linear  binary  problems  with 
respect  to  only  a  subset  of  variables  y  (i.e.,  problem  (72)-(73))  and  one  MSAD  (i.e.,  problem  (74)) 
with  respect  to  only  the  original  x  variables.  It  is  well  known  that  an  unconstrained  linear  binary 
program  of  size  n  can  be  solved  in  0(n)  time.  Therefore,  it  takes  0(n2)  time  to  compute  all  g* 
for  all  i  in  an  MSAD.  This  implies  that  the  computational  complexity  of  solving  L( A)  as  well  as 
obtaining  both  upper  and  lower  bounds  on  the  original  problem  (P2)  relies  on  how  fast  one  can 
solve  the  corresponding  MSAD. 

It  is  easy  to  see  that  MSAD  has  the  same  set  of  constraints  as  (P2).  Hence,  we  can  directly  apply 
efficient  heuristics  or  exact  solution  methods  that  are  available  to  those  linear  binary  programming 
counterparts  of  (P2)  that  have  the  same  set  of  constraints.  To  the  best  of  our  knowledge,  such 
“structure  preserving”  feature  is  not  inherent  to  existing  general  linearization  techniques.  In  this 
section,  we  survey  several  classes  of  QBPs  and  illustrate  how  we  can  exploit  their  special  constraint 
structure  to  improve  the  efficiency  of  solving  MSADs.  We  also  discuss  a  few  ideas  regarding  how  to 
cope  with  the  computational  intractability  of  general  constrained  MSADs. 

MSAD  with  Exact  Solution  in  Polynomial  Time 
Unconstrained  QBP 

Applications  of  unconstrained  QBPs  are  numerous.  For  example,  Laughunn  [93]  studied  capital 
budgeting  and  investment  portfolio  selection.  Chardaire  and  Sutter  discussed  several  other  appli¬ 
cations  of  unconstrained  QBPs  in  [38].  It  is  also  worth  noting  that  a  large  portion  of  QBP  solution 
studies  are  focused  on  unconstrained  QBPs. 

Apparently,  if  (P2)  of  size  n  is  unconstrained,  then  the  respective  MSAD  is  also  unconstrained 
and  can  be  solved  in  O(n).  Hence,  solving  (68)  takes  only  0(n2)  for  unconstrained  QBPs. 
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Dense  fc-Subgraph  Problem  (DA;S) 

In  the  dense  £:-subgraph  maximization  problem  [55],  we  are  given  a  graph  G  with  n  nodes,  weight 
Wi  associated  with  each  node  and  weight  Wij  associated  with  each  edge  between  nodes  i  and  j.  For 
the  parameter  k,  the  D/cS  problem  is  to  find  a  subgraph  of  size  k  which  has  the  maximum  total 
weight  of  the  nodes  and  edges. 

The  DA;S  problem  can  be  formulated  as: 


max 

X 


1 


y:  wiXi + yy  yy  -wijXixj 


i=i 


i= 1  j=l  ,j¥=i 


(104) 


s.t.  y^  Xi  =  k ;  (105) 

i 

Xi  £  {0, 1},  for  i  =  1,..  . . ,  n, 

where  Xi  is  the  variable  indicating  whether  node  i  should  be  selected.  In  the  case  where  only  the 
weights  of  edges  are  considered,  Wi  can  be  simply  set  to  zero  for  all  i.  The  MSAD  of  a  D/cS  problem 
is  then  derived  as: 

{n  n  f'ij 

E^-EEE^-4 

i= 1  i= 1  j>i  1=1 

The  problem  (106)  can  be  solved  by  sorting  all  g* ,  selecting  the  indices  that  correspond  to  k  largest 
g* ,  set  the  corresponding  x*  to  be  1,  and  set  the  other  variables  Xi  to  be  0.  Since  sorting  can  be 
done  in  O(nlogn),  the  MSAD  of  a  D/cS  problem  can  be  solved  in  polynomial  time. 


(105)  -  (106)  }  . 


(106) 


Quadratic  Semi- Assignment  Problem  (QSAP) 

The  quadratic  semi-assignment  problem  [20]  is  to  minimize  a  quadratic  pseudo-Boolean  function 
subject  to  the  semi-assignment  constraints.  The  problem  is  known  to  be  AP-hard  [20].  Many 
task-assignment  problems  in  distributed  systems  can  be  easily  formulated  as  QSAPs  [19,  40,  146]. 
The  problem  also  has  important  applications  in  a  variety  of  other  fields,  e.g.,  bioinformatics  [60]. 
The  quadratic  0-1  formulation  of  QSAP  is  given  as  follows: 


n  m 
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%ik  €  {0, 1},  for  i  =  1, . . . ,  n,  k  =  1, . . . ,  m. 
The  MSAD  of  a  QSAP  is  then  derived  as: 
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An  optimal  solution  to  (110)  can  be  obtained  in  0(mn),  as 


® ik 
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0, 


otherwise, 
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(111) 


for  i  =  1, . . .  ,n.  Hence,  QSAP  is  another  example  whose  corresponding  MS  AD  can  be  solved  in 
polynomial  time.  Since  there  are  totally  mn  variables  in  this  problem,  the  computational  complexity 
for  solving  the  MSAD  for  a  quadratic  semi-assignment  problem  is  the  same  as  for  an  unconstrained 
QBP. 


Quadratic  Assignment  Problem  (QAP) 

QAP  is  one  of  the  most  important  classes  of  constrained  QBPs  as  it  can  be  used  to  model  a  variety 
of  real-world  problems  in  facility  allocation,  parallel  and  distributed  computing,  combinatorial  data 
analysis,  among  others.  The  surveys  on  the  problem  can  be  found  in  Burkard  [28]  and  Pardalos  et 
al.  [114],  Besides  being  AP-hard,  QAP  is  known  to  be  computationally  challenging  even  for  rather 
small  size  instances  [97].  QAP  was  first  studied  by  Lawler  [94]  about  half  a  century  ago,  and  there  is 
extended  literature  dedicated  to  obtaining  bounds  or  suboptimal  solutions  to  QAPs  [9,  12,  13,  14], 
Similar  to  QSAP,  a  QAP  can  be  formulated  as: 
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Xik  £  {0, 1},  for  i  =  1, . . . ,  n,  k  =  1, . . . ,  n. 
The  MSAD  of  the  QAP  is  then  derived  as: 
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which  has  the  property  of  total  unimodularity  and  can  thus  be  solved  in  polynomial  time  (e.g.,  with 
the  Hungarian  method  in  0(n3)  time). 


General  Constrained  MSAD 

In  general,  an  MSAD  is  an  constrained  binary  linear  program  and  no  polynomial  solution  is  available. 
In  this  section,  we  describe  three  approaches  that  deal  with  the  constraints.  We  note  that  the 
performance  of  applying  these  three  methods  may  vary  depending  on  a  particular  application. 


Solving  the  LP  Relaxation  of  MSAD 

A  straightforward  way  to  deal  with  a  general  constrained  MSAD  is  to  relax  the  integrality  restric¬ 
tions  on  x  and  solve  the  respective  LP  relaxation.  It  is  easy  to  see  that  the  LP  relaxation  provides 
an  upper  bound  to  (P2).  A  potential  drawback  of  using  this  method  is  the  inferiority  of  such  a 
bound.  In  addition,  no  feasible  solution  to  (P2)  is  guaranteed.  A  possible  remedy  is  solving  the 
LP  relaxation  of  the  MSAD  iteratively  to  update  the  Lagrangian  multipliers  until  the  last  iteration 
of  a  subgradient  method  and  only  dealing  with  the  construction  of  a  feasible  solution  at  the  last 
iteration.  In  our  computational  experiments  on  quadratic  binary  knapsack  problem  instances,  we 
use  this  approach  and  apply  a  rounding  heuristic  to  obtain  a  feasible  solution  at  the  last  iteration. 
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Relaxing  Original  Constraints 

Fisher  [57]  discussed  the  use  of  Lagrangian  relaxation  to  solve  integer  programming  problems  in 
general.  Given  any  A,  we  can  obtain  an  upper  bound  to  (74)  by  further  relaxing  constraint  (48) 
and  solving  the  Lagrangian  dual  problem.  The  Lagrangian  relaxation  bound  of  MSAD  is  at  least 
as  tight  as  the  LP  relaxation  bound  [102].  In  the  actual  computational  implementation,  one  can 
combine  the  solutions  of  the  two  Lagrangian  duals  (relaxation  of  both  quadratic  constraints  (49) 
and  original  constraints  (48))  without  destroying  the  decomposability.  This  idea  can  be  beneficial 
in  many  problems,  especially  when  there  are  not  many  original  constraints,  such  as  in  a  quadratic 
binary  knapsack  problem.  It  may  be  also  beneficial  to  consider  relaxing  a  subset  of  (48),  as  suggested 
by  many  studies  in  the  IP  literature. 

Applying  Existing  Heuristics 

Many  binary  linear  programs  (e.g.,  linear  binary  knapsack  problem)  have  efficient  heuristics  to 
obtain  good  suboptimal  solutions.  With  our  decomposition  method,  these  heuristics  can  be  adapted 
to  solve  MSAD  and  provide  feasible  solutions.  In  addition,  we  note  that  surrogate  subgradient 
methods  (e.g.,  [150])  may  be  used  here  to  update  the  Lagrangian  multipliers  effectively  when  only 
a  suboptimal  solution  is  obtained  at  each  step. 

3.4.4  Computational  Considerations  in  the  B&B  Framework 

It  is  natural  to  embed  our  Lagrangian  relaxation  bounding  method  into  a  branch-and-bound  frame¬ 
work.  In  this  section,  we  discuss  two  computational  issues  in  our  actual  implementation  of  the 
branch-and-bound  algorithm. 

Subgradient  Method 

The  subgradient  method  used  to  solve  each  Lagrangian  relaxation  problem  is  the  most  time- 
consuming  part  in  the  branch-and-bound  algorithm.  We  employ  two  practically  useful  ideas  in 
our  actual  implementation  to  alleviate  this  computational  burden. 

First,  we  set  the  maximum  number  of  iterations  in  the  subgradient  optimization  at  each  node 
and  terminate  each  Lagrangian  relaxation  optimization  once  the  number  of  iterations  exceeds  this 
threshold.  We  may  choose  different  threshold  values  for  different  nodes  in  the  tree.  Intuitively, 
with  this  approach,  spending  more  time  at  the  beginning  of  a  branch-and-bound  solution  procedure 
would  likely  lead  to  more  promising  solutions  at  the  beginning  of  the  procedure  and  prevent  serious 
“tailing-off”  effect  at  the  end.  In  our  actual  implementation,  we  set  this  threshold  to  be  1000  at  the 
root  node  and  set  it  to  be  5  at  each  of  the  other  tree  nodes.  Our  preliminary  experiments  indicate 
the  benefit  of  using  this  specific  setting. 

Furthermore,  unlike  Caparara  et  al.  [32]  that  used  the  same  initial  Lagrangian  multipliers 
throughout  the  entire  branch-and-bound  tree,  we  find  through  our  preliminary  experiments  that  it 
is  beneficial  to  “pass”  Lagrangian  multipliers  from  father  nodes  to  children  nodes  along  the  tree. 
That  is,  when  branching  at  a  branch-and-bound  tree  node,  we  use  its  final  iterative  Lagrangian 
multipliers  in  the  subgradient  method  to  set  the  initial  Lagrangian  multipliers  at  its  children  nodes. 
This  implies  that  we  do  not  start  the  subgradient  method  with  initial  Lagrangian  multipliers  being 
zero  at  each  tree  node  in  our  actual  implementation.  Intuitively,  with  this  approach,  we  are  more 
likely  to  use  near-optimal  multipliers  from  the  beginning  of  each  Lagrangian  relaxation  optimization. 
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Branching  Rule 

For  a  binary  optimization  problem,  branching  is  equivalent  to  value  fixation  of  the  selected  variable 
to  0  or  1.  It  is  important  to  note  that  at  each  node  of  our  branch- and-bound  tree,  we  will  select  not 
only  variable  to  fix,  but  also  all  the  corresponding  variables  yj ,  j  =  1, . . .  ,n,  j  j -  i.  Therefore, 
the  maximum  number  of  variables  that  need  to  be  branched  on  is  n. 

Unlike  standard  linearization  methods  that  lack  a  general  guideline  for  branching,  our  La- 
grangian  relaxation  method  naturally  lends  itself  to  an  efficient  branching  rule.  For  each  X{,  let 
Wi  =  J2j>i  E[=1  Kj  +  i  We  then  select  the  variable  with  the  largest  corresponding 

w  value  to  branch  at  each  node  of  the  branch-and-bound  tree,  i.e. ,  i*  =  arg  max”_  j  {wt } .  In  other 
words,  we  select  the  variable  that  offers  the  largest  sum  of  the  Lagrangian  multipliers  associated 
with  it.  Intuitively,  this  branching  rule  may  identify  the  most  “unstable”  x  variable.  The  level  of 
referred  “stability”  indicates  how  much  a  variable  xt  deviates  from  its  associated  variables  yj .  To 
better  understand  this,  take  r  =  2  for  instance.  In  (83),  for  each  (i.  j)  with  1  <  i  <  j  <  n,  (A \j)k , 
l  =  1  or  2,  increases  at  each  iteration  k  only  when  the  corresponding  constraint  (84)  or  (85)  does 
not  hold.  Therefore,  the  larger  the  final  Lagrangian  multipliers  are,  the  more  their  corresponding 
constraints  are  violated.  A  major  contributor  to  the  violation  is  the  difference  between  each  variable 
X',;  and  its  associated  y\  variables.  Apparently,  this  argument  applies  in  general  cases. 

3.4.5  Computational  Experiments 

We  conducted  computational  experiments  on  randomly  generated  test  instances  of  the  uncon¬ 
strained  quadratic  binary  problem  (UQBP)  and  the  quadratic  binary  knapsack  problem  (QBKP). 
To  show  the  superior  performance  of  our  Lagrangian  relaxation  based  branch-and-bound  method, 
we  compared  it  to  the  direct  CPLEX  MIP  solutions  of  the  standard  linearized  reformulation  (PSL) 
and  an  MIP  reformulation  that  appears  in  Oral  and  Kettani  [109],  which  contains  only  n  auxil¬ 
iary  continuous  variables  and  n  additional  constraints.  This  technique  was  originally  presented  in 
Glover  [64]  and  Glover  and  Woolsey  [67].  Some  other  variations  are  also  discussed  in  [4],  However, 
to  the  best  of  our  knowledge,  the  MIP  reformulation  presented  in  Oral  and  Kettani  [109]  introduces 
the  fewest  auxiliary  variables  and  additional  constraints  among  various  reformulations  due  to  this 
technique.  We  use  the  MIP  reformulation  in  Oral  and  Kettani  [109]  to  present  the  computational 
comparison  in  this  chapter  and  term  their  reformulation  as  OK  reformulation. 

To  construct  reformulation  (P2)  for  our  method,  we  added  to  the  original  formulation  (P0), 
two  quadratic  constraints  (84)  and  (85)  for  each  index  pair.  We  set  e  =  —0.7  in  our  experiments. 
To  solve  (P2),  we  implemented  our  branch  and  bound  method  in  C  with  Python  2.6.  With  either 
comparative  linearization  technique,  we  solved  the  resultant  reformulation  using  Cplex  10.1  with 
default  settings  as  well.  We  primarily  recorded  the  solution  time.  However,  we  set  a  CPU  time 
limit  of  one  hours  for  all  three  methods.  When  the  time  limit  was  reached,  we  reported  the  best 
suboptimal  solution.  All  our  computational  experiments  were  conducted  on  a  Linux  2.6.18  64bit 
machine  with  16GB  RAM  and  an  Intel  Xeon  X5365  CPU  of  3.0GHz. 

Test  Instance  Generation 

To  design  a  test  instance,  we  first  generated  the  objective  function  coefficient  matrix,  which  contains 
both  Cij  and  Cj.  Note  that  we  assumed  ct  =  c,L,L  for  each  i  =  1, . . . ,  n.  We  used  the  method  described 
by  Pardalos  [116].  The  off-diagonal  elements  in  the  matrix  were  drawn  uniformly  from  [—50,  50].  For 
diagonal  elements,  two  parameter  settings  were  considered.  In  the  first  setting,  diagonal  elements 
were  drawn  uniformly  from  [—50,50].  In  the  second  setting,  the  diagonal  elements  were  drawn 
uniformly  between  [0,  75].  All  the  generated  objective  coefficient  matrices  were  of  full  density.  Note 
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that  to  solve  instances  with  partially  dense  objective  coefficient  matrices,  we  only  need  to  impose 
the  quadratic  constraints  on  the  pairs  of  variables  whose  cross-term  coefficients  are  nonzero.  It 
helps  improve  the  solution  efficiency.  However,  it  increases  the  implementation  complexity.  Hence, 
we  leave  its  implementation  to  our  future  work. 

For  the  QBKP  instances,  we  generated  the  constraints  in  the  following  way.  First,  we  randomly 
generated  the  constraint  matrix  using  a  similar  method  as  the  one  used  to  generate  the  objective 
function  coefficient  matrix  in  [116].  Second,  for  each  test  instance,  we  randomly  generated  a  feasible 
solution  and  multiplied  it  with  the  constraint  matrix  to  form  the  nominal  right-hand  side  of  the 
constraints.  In  this  way,  we  ensured  that  each  test  instance  would  have  at  least  one  feasible  solution. 
Finally,  we  perturbed  the  right-hand  side  by  adding  some  variance  that  was  randomly  generated 
from  a  uniform  distribution  [7(0, 20).  In  this  way,  we  allowed  multiple  feasible  solutions  and  thus 
guaranteed  the  non-triviality  of  each  instance. 

We  considered  three  sizes  of  the  problem  (n  =  30, 40,  and  50)  for  each  test  problem  (UQBP  or 
QBKP).  Given  each  problem  size,  we  randomly  generated  10  test  instances  with  each  parameter 
setting  for  generating  the  diagonal  elements  of  the  objective  function  coefficient  matrix.  In  summary, 
we  randomly  generated  120  test  instances  in  total. 

Computational  Results  and  Discussions 

In  the  following  two  tables,  we  report  the  comparative  running  times  (in  seconds)  with  the  three 
methods.  Tables  2  and  3  present  the  results  for  the  tested  UQBP  and  QBKP  instances,  respectively. 
There  are  two  portions  in  each  table  that  correspond  to  each  of  the  two  parameter  setting  on  the 
objective  function  coefficient  matrix  generation.  Note  that  for  each  QBKP  instance,  we  solved 
the  LP  relaxation  of  the  resultant  MSAD  at  each  iteration  in  the  subgradient  method  until  the 
final  iteration.  We  applied  a  standard  rounding  heuristic  to  obtain  a  feasible  solution  in  the  final 
iteration. 

We  use  “LD”,  “PSL”  and  “OK”  to  represent  the  Lagrangian  decomposition  method  introduced 
in  this  chapter,  and  the  direct  solutions  of  the  PSL  [5]  and  OK  [109]  reformulations,  respectively. 
If  the  time  limit  is  reached,  we  use  “TO”  to  indicate  it  and  report  in  the  following  parentheses 
the  relative  gap  of  the  best  feasible  solution  obtained  by  the  studied  algorithm  to  the  optimal 
solution.  For  all  of  the  test  instances,  our  Lagrangian  relaxation  based  method  outperforms  both 
direct  solutions  of  the  PSL  and  OK  reformulations  in  terms  of  the  solution  time.  Our  method  is  on 
average  two  to  three  times  faster  than  the  direct  CPLEX  solution  of  PSL.  This  ratio  seems  to  be 
insensitive  to  the  size  increase.  We  also  conclude  that  our  method  can  always  find  better  feasible 
solutions  compared  to  the  standard  linearization  technique  for  those  instances  that  cannot  be  solved 
to  optimality  within  the  time  limit.  Although  the  OK  reformulation  can  obtain  an  equally  good 
feasible  solution  for  all  of  the  test  instances,  it  has  to  take  a  rather  large  amount  of  time  for  the 
method  to  discern  the  optimality. 

3.5  Concluding  Remarks  and  Future  Work 

In  this  line  of  research,  we  have  proposed  two  dual  decomposition  schemes  for  stochastic  QBPs 
and  developed  an  innovative  Lagrangian  decomposition  based  method  to  solve  deterministic  QBPs. 
Our  focus  has  been  the  latter  one  up  to  this  point.  The  key  of  the  Lagrangian  decomposition  based 
method  is  that  we  introduce  parameterized  quadratic  constraints,  which  results  in  solving  a  series  of 
binary  linear  programs  to  compute  a  Lagrangian  relaxation  bound.  We  provide  a  sufficient  condi¬ 
tion  on  the  parameter  specification  for  the  introduced  quadratic  constraints.  We  also  discuss  several 
special  cases  of  parameter  specifications  and  their  impacts  on  the  bound  tightness.  We  illustrate 
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Diagonal  elements:  U(— 50, 50) 


Size  n  =  30  n  =  40  n  =  50 


Inst.  No. 

LD 

PSL 

OK 

LD 

PSL 

OK 

LD 

PSL 

OK 

1 

0.74 

2.07 

4.09 

76.13 

244 

TO(0%) 

1418.39 

3286.58 

TO(0%) 

2 

0.71 

2.60 

6.78 

10.74 

37.31 

164.86 

1373.89 

TO(1.46%) 

TO(0%) 

3 

2.08 

6.37 

41.61 

34.13 

99.11 

680.93 

519.99 

1672.28 

TO(0%) 

4 

0.10 

1.09 

0.15 

56.36 

222.50 

2041.93 

811.17 

1550.94 

TO(0%) 

5 

0.39 

1.42 

1.75 

67.09 

318.13 

3421.51 

1204.44 

2868.07 

TO(0%) 

6 

0.77 

2.14 

4.24 

22.38 

41.18 

342.88 

1311.41 

3548.89 

TO(0%) 

7 

0.62 

2.56 

3.01 

21.60 

63.53 

303.31 

573.99 

1449.92 

TO(0%) 

8 

0.59 

2.66 

4.31 

50.78 

147.06 

1811.44 

984.99 

2720.1 

TO(0%) 

9 

0.72 

2.24 

3.19 

66.61 

244.12 

3335.31 

2236.05 

TO(0%) 

TO(0%) 

10 

0.56 

1.73 

1.55 

23.44 

66.43 

485.69 

1173.20 

TO(0%) 

TO(0%) 

Diagonal  elements:  17(0,75) 

Size 

n  =  30 

n  =  40 

n  =  50 

Inst.  No. 

LD 

PSL 

OK 

LD 

PSL 

OK 

LD 

PSL 

OK 

1 

0.35 

1.21 

1.02 

10.15 

27.37 

145.59 

1495.81 

2978.02 

TO(0%) 

2 

0.27 

1.08 

0.27 

16.75 

31.18 

136.12 

468.73 

1616.57 

TO(0%) 

3 

0.18 

0.83 

0.35 

16.80 

47.93 

251.56 

639.23 

1275.13 

TO(0%) 

4 

0.18 

0.91 

0.50 

7.66 

20.69 

37.88 

324.76 

1148.55 

TO(0%) 

5 

0.43 

1.61 

1.40 

29.67 

66.70 

434.75 

548.56 

1704.43 

TO(0%) 

6 

0.25 

0.99 

0.98 

6.48 

16.16 

46.87 

156.65 

360.73 

1433.89 

7 

0.25 

1.06 

0.35 

22.48 

79.36 

538.45 

1072.17 

TO(2.08%) 

TO(0%) 

8 

0.69 

2.26 

2.53 

17.54 

37.93 

279.88 

2254.32 

TO(0.01%) 

TO(0%) 

9 

0.08 

0.68 

0.06 

6.88 

14.43 

17.08 

362.04 

433.83 

TO(0%) 

10 

0.60 

1.86 

2.21 

30.58 

81.72 

1469.03 

639.57 

1802.08 

TO(0%) 

Table  2:  Comparative  results  on  the  UQBP  instances  (running  times,  in  seconds) 
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Diagonal  elements 

U  (—50, 50) 

Size 

n  =  30 

n  =  40 

n  =  50 

Inst.  No. 

LD 

PSL 

OK 

LD 

PSL 

OK 

LD 

PSL 

OK 

1 

0.33 

1.17 

0.68 

33.97 

69.48 

791.43 

1517.75 

2982.94 

TO(0%) 

2 

1.40 

3.81 

7.55 

20.35 

24.89 

107.72 

2686.68 

TO  (0.74%) 

TO(0%) 

3 

1.35 

3.15 

15.74 

38.59 

125.41 

997.92 

459.77 

1199.46 

TO(0%) 

4 

1.39 

3.53 

16.23 

39.18 

73.95 

978.38 

1064.81 

2813.01 

TO(0%) 

5 

1.57 

2.61 

5.60 

29.99 

42.17 

386.29 

586.36 

1245.19 

TO(0%) 

6 

0.51 

1.53 

2.35 

52.13 

147.57 

1645.26 

1282.25 

2838.76 

TO(0%) 

7 

0.63 

2.69 

4.55 

49.67 

57.78 

674.76 

TO(0%) 

TO(6.53%) 

TO(0%) 

8 

0.73 

2.10 

3.39 

18.97 

43.66 

239.95 

TO(0%) 

TO  (3.00%) 

TO(0%) 

9 

2.11 

4.65 

11.55 

69.06 

128.07 

2405.16 

2649.65 

TO(6.01%) 

TO(0%) 

10 

0.36 

1.62 

1.58 

12.38 

23.09 

81.17 

400.87 

876.90 

TO(0%) 

Diagonal  elements:  U( 0,75) 


Size 

n  =  30 

n  =  40 

n  =  50 

Inst.  No. 

LD 

PSL 

OK 

LD 

PSL 

OK 

LD 

PSL 

OK 

1 

1.15 

2.57 

3.17 

37.15 

137.11 

1781.01 

2498.06 

TO(1.28%) 

TO(0%) 

2 

1.18 

3.84 

11.81 

128.24 

228.96 

3147.89 

1078.95 

2585.18 

TO(0%) 

3 

0.57 

2.09 

3.15 

21.46 

27.70 

178.57 

TO(0%) 

TO(8.01%) 

TO(0%) 

4 

0.31 

1.10 

0.65 

90.10 

226.75 

1821.99 

2208.05 

3158.26 

TO(0%) 

5 

5.57 

10.22 

72.21 

96.19 

117.34 

1159.45 

1299.29 

3337.02 

TO(0%) 

6 

0.36 

1.32 

0.83 

42.71 

75.95 

726.00 

2422.73 

TO(0%) 

TO(0%) 

7 

0.96 

1.64 

3.87 

104.81 

217.69 

TO(0%) 

3507.39 

TO  (2.46%) 

TO(0%) 

8 

1.82 

2.56 

9.35 

57.75 

82.70 

1232.01 

273.90 

486.37 

TO(0%) 

9 

1.37 

4.69 

16.56 

68.99 

124.38 

2573.05 

921.49 

TO(0%) 

TO(0%) 

10 

1.68 

3.51 

5.50 

49.87 

81.94 

800.43 

TO(0%) 

TO(3.42%) 

TO(0%) 

Table  3:  Comparative  results  on  the  QBKP  instances  (running  times,  in  seconds) 
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that  our  method  does  not  change  the  underlying  structure  of  the  original  QBP.  Therefore,  we  have 
the  potential  to  use  the  existing  fast  solution  methods  for  the  problem’s  IP  counterpart.  Computa¬ 
tionally,  we  discuss  several  practically  useful  ideas,  including  parameter  specifications,  subgradient 
method  for  computing  Lagrangian  duals,  and  variable  selection  in  the  branch-and-bound  algorithm. 
Our  numerical  experiments  on  the  two  classes  of  QBPs  show  that  our  method  outperforms  two  well 
known  linearization  techniques.  As  for  the  development  of  the  dual  decomposition  schemes  for 
generic  SQBPs,  our  aim  is  to  seamlessly  integrate  dual  decomposition  and  linearization.  We  are 
currently  conducting  computational  experiments  on  larger  instances  to  investigate  the  tradeoff  be¬ 
tween  bound  tightness  and  computational  time  on  various  proposed  Lagrangian  relaxation  bounds. 

For  the  Lagrangian  decomposition  based  method,  we  propose  several  research  items  for  further 
improvement  of  our  method.  First,  in  this  work  we  use  a  specific  parameter  setting  for  the  pair 
of  quadratic  constraints  (84)  and  (85)  in  all  of  our  computational  experiments.  In  the  future,  it  is 
worthwhile  to  investigate  the  impact  of  other  parameter  settings  (e.g.,  the  value  of  e)  for  the  case 
r  =  2  both  analytically  and  computationally.  Furthermore,  in  this  work,  we  present  some  prelimi¬ 
nary  attempts  to  understand  the  impact  of  r  >  2.  It  is  worthwhile  to  explore  more  general  cases  of 
using  the  proposed  family  of  quadratic  constraints.  Analytically,  we  plan  to  evaluate  the  bounding 
performance  with  additional  constraints  in  the  reformulation.  Computationally,  we  plan  to  investi¬ 
gate  how  each  case  affects  the  solution  of  the  respective  Lagrangian  duals.  Note  also  that  in  this  work 
the  subgradient  method  for  computing  the  Lagrangian  relaxation  bound  is  the  most  time-consuming 
part  in  the  branch-and-bound  algorithm.  Therefore,  we  need  to  investigate  more  advanced  subgra¬ 
dient  methods  as  well  as  their  integration  within  the  branch-and-bound  algorithm.  Finally,  we 
plan  to  tune  our  algorithm  for  solving  other  classes  of  QBPs,  e.g.,  quadratic  assignment  problems. 
For  the  application  of  dual  decomposition  to  solving  SQBPs,  our  future  work  will  be  focused  on  the 
computational  aspect.  Once  both  parts  of  this  research  are  more  mature,  we  plan  to  incorporate  the 
parametric  quadratic  constraints  introduced  in  our  deterministic  QBP  research  into  dual  decompo¬ 
sition  for  SQBPs,  our  stochastic  QBP  research.  Our  ultimate  goal  is  to  develop  a  computationally 
attractive  Lagrangian  decomposition  based  branch-and-bound  method  for  generic  SQBPs. 
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4  Two-Stage  Stochastic  Minimum  s  —  t  Problem 

This  chapter  is  mostly  based  on  the  results  from: 

•  S.  Rebennack,  O.A.  Prokopyev,  “Two-Stage  Stochastic  Minimum  s  —  t  Problem:  Formulations 
and  Complexity,”  Technical  Report,  2011. 


4.1  Introduction 

Let  G  =  ( V. ,  E )  be  a  directed  graph  with  node  set  V.  arc  set  E  C  V  x  V  and  nonnegative  costs  Cij 
given  for  each  arc  ij  £  E.  The  minimum  s  —  t  cut  problem  for  directed  graphs  can  be  defined  as 
follows  [130].  For  a  given  directed  connected  graph  G  =  (V,  E)  with  root  node  s  and  terminal  node 
t,  the  task  is  to  find  a  node  set  S  C  V  with  s  £  S  and  a  node  set  T  C  V  with  t  £  T,  such  that 
S  UT  =  V,  S'nT  =  0  and  the  cost  of  the  cut  c[5,  T]  :=  YlijeE-i£S/\jeT  °ij  is  minimized. 

Graph  connectivity  is  one  of  the  classical  research  topics  in  the  graph  theory,  with  a  variety 
of  practical  applications,  in  particular,  in  network  design  [8].  The  minimum  s  —  t  cut  problem 
and  the  maximum  flow  problem  are  dual  problems  to  each  other.  This  relation  is  called  Max-Flow 
Min- Cut  Theorem  and  was  first  proven  by  Ford  and  Fulkerson  [58].  This  duality  enhanced  the 
development  of  many  polynomial  time  algorithms  computing  minimum  s  —  t  cuts  [8]. 

The  following  mathematical  programming  formulation  of  the  minimum  s  —  t  cut  problem  dates 
back  to  Ford  and  Fulkerson  in  1962  [59].  Let 


f  1,  if  ieT 
Xi  ~  {  0,  if  *  £  S 

This  allows  the  following  linear 


min 

s.t. 


and 

_  f  1,  if  i  £  S,j  £  T 
^%J  \  0,  otherwise 

(117) 

0-1  programming  problem  formulation: 

y!  cijVij 

(118) 

ij&E 

Vij  >  Xj  -  Xi 

Vij  £  E 

(119) 

xs  —  0,  xt  =  1 

(120) 

Xi,Vij  £  {0, 1} 

Vi  £  V,  ij  £  E 

(121) 

Note  that  Ford  and  Fulkerson  consider  equation  xt  —  xs  =  1  instead  of  (120)  in  [59].  The 
constraint  matrix  defined  by  (119)  -  (120)  is  totally  unimodular  [59]  (see  further  discussion  in 
Sections  4.2.1  and  4.2.2),  allowing  the  relaxation  of  the  variables  x  and  y  to  be  non-negative, 
continuous  and  bounded  above  by  1.  This  provides  another  indication  of  the  fact  that  the  minimum 
s  —  t  cut  problem  is  polynomially  solvable. 

An  alternative  definition  of  the  minimum  s  —  t  cut  problem  can  be  provided  using  the  notion 
of  the  cutset.  Define  cutset  as  a  set  of  arcs  whose  removal  ensures  that  there  is  no  no  directed 
path  from  s  to  t.  Then  the  minimum  s  —  t  cut  problem  can  be  defined  as  the  problem  of  finding 
a  cutset  of  the  minimum  weight.  Note  that  formulation  (118)-(121)  has  a  clear  interpretation  in 
this  framework  since  variable  yij  is  1  if  the  corresponding  arc  ij  £  E  belongs  to  the  required  cutset. 
Motivated  by  this  fact,  we  introduce  the  following  two-stage  stochastic  extension  of  the  original 
deterministic  problem. 

Definition  1  (two-stage  stochastic  minimum  s  —  t  cut)  Given  is  a  directed  graph  G  =  ( V. ,  E) 

with  node  set  V  and  arc  set  E  and  a  root  s  £  V.  There  are  K  scenarios.  The  kth  scenario  consists 
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of  a  single  terminal  tk  and  has  probability  pk  of  being  realized.  Arc  ij  £  E  has  cost  Cij  in  the  first 
stage  and  dkrj  in  the  recourse  stage  (or  second  stage)  if  the  kth  scenario  is  realized.  The  task  is  to 
find  a  set  of  arcs  E$  to  be  cut  in  the  first  stage  and  for  each  scenario  k,  an  arc  set  Ek  to  be  cut  in 
the  recourse  stage  if  scenario  k  is  realized,  such  that  removing  from  the  graph  G  disconnects 

s  from  the  terminal  tk .  The  objective  is  to  minimize  the  expected  cost  over  all  scenarios: 

zA’  :=min(  £  cy  +  £/  £  ,% 

\ij£Eo  k= 1  ij€Ek 


A  number  of  studies  demonstrates  the  potential  benefits  of  stochastic  programming  solutions 
over  deterministic  approaches  [133].  Stochastic  programming  models  take  advantage  of  the  fact 
that  probability  distributions  governing  the  data  are  known  or  can  be  estimated.  Therefore,  it 
is  natural  to  consider  stochastic  extensions  of  classical  graph  and  network  design  problems.  The 
authors  in  [48,  70]  discuss  a  somewhat  related  robust  s  —  t  min-cut  problem ,  where  the  task  is 
to  minimize  the  maximum  cost  over  all  scenarios  while  disconnecting  s  from  terminals  tk .  Other 
recent  examples  include  two-stage  stochastic  extensions  of  maximum  weight  matching  [90] ,  shortest 
path  [70],  minimum  spanning  tree  [49,  61]  and  Steiner  tree  [75]  problems.  For  an  introduction  to 
stochastic  programming,  we  refer  the  reader  to  Birge  and  Louveaux  [23]. 

The  remainder  of  this  chapter  is  organized  as  follows.  In  Section  4.2,  we  provide  a  linear  mixed 
0-1  programming  formulation  of  the  two-stage  stochastic  minimum  s  —  t  cut  problem  that  is  a  gen¬ 
eralization  of  the  classical  model  (118)-(121).  Unfortunately,  the  constraint  matrix  of  the  proposed 
mathematical  program  loses  the  total  unimodularity  property  (Section  4.2.2)  of  the  original  deter¬ 
ministic  formulation;  however,  this  property  is  preserved  if  graph  G  is  a  tree  (Section  4.2.3).  This 
fact  turns  out  to  be  not  surprising  as  we  prove  in  Section  4.3  that  the  considered  problem  is  NP- 
hard,  while  a  linear  time  solution  algorithm  is  available  when  the  graph  is  a  tree  (see  Section  4.4). 
In  Section  4.5,  we  discuss  another  variation  of  the  two-stage  stochastic  minimum  s  —  t  cut  problem 
(referred  to  as  the  node-based  version),  that  is  motivated  by  an  alternative  formulation  for  the 
deterministic  problem  via  a  quadratic  0-1  program.  Finally,  Section  4.6  concludes  the  discussion. 

We  should  also  note  that  as  a  side  result  in  Section  4.2.1  we  derive  a  new  characterization  of 
totally  unimodular  matrices  that  generalizes  some  of  the  well-known  results  by  Camion  [30,  31]. 
This  characterization  is  necessary  for  our  discussion  in  Section  4.2.3. 

4.2  Mathematical  Programming  Formulation 

Let  us  discuss  the  two-stage  stochastic  minimum  s  —  t  cut  problem  considering  the  graph  in  Figure  1. 
This  graph  has  four  nodes  with  node  1  as  the  root  and  node  4  as  the  terminal.  Two  scenarios  are 
given  with  equal  probabilities  of  0.5. 

In  the  two-stage  stochastic  minimum  s  —  t  cut  problem,  one  has  to  decide  which  arcs  have  to 
be  cut  in  the  first  stage  and  in  the  second  stage,  where  the  cut  in  the  second  stage  depends  on  the 
particular  scenario  of  the  second  stage.  An  optimal  solution  using  this  arc-based  interpretation  is 
shown  in  Figure  2.  In  the  first  stage,  both  arcs  (2,3)  and  (3,2)  are  cut.  In  the  second  stage,  either 
arcs  (1,2)  and  (3,4)  are  cut  in  case  of  scenario  1,  or  arcs  (1,3)  and  (2,4)  are  cut  in  case  of  scenario 
two.  This  way,  the  optimal  objective  function  value  is  4. 

Interestingly,  in  both  scenarios,  the  resulting  graph  is  disconnected  after  removing  the  arcs  but 
the  removal  of  the  arcs  does  not  correspond  to  a  “minimum  cut”  in  the  classic  sense,  i.e.,  as  a 
partition  of  the  nodes.  This  is  the  case,  as  always  both  arcs  (2,3)  and  (3,2)  are  cut  in  the  first  stage. 
We  interpret  this  solution  as  hedging  of  arcs.  However,  note  that  the  resulting  arcs  define  a  valid 
cutset. 
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Figure  1:  A  two-stage  minimum  s  —  t  cut  instance  with  node  1  as  the  root  node  and  node  4  as  the 
destination  node.  Given  are  two  scenarios  with  equal  probability. 


On  the  other  hand,  let  us  consider  each  scenario  independently  after  the  arcs  that  are  cut  in 
the  first  stage  are  removed  from  G.  Then  we  can  observe  that  the  resulting  subproblem  for  each 
scenario  is  a  classical  minimum  s  —  t  cut  problem  that  can  be  equivalently  interpreted  either  as  a 
partition  of  the  nodes  or  a  cutset  of  arcs. 
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Figure  2:  Hedging:  in  the  first  stage,  both  arc  (2,3)  and  (3,2)  are  cut;  all  arcs  are  cut  in  this  graph. 
The  cost  of  this  minimum  cut  is  4. 

Let  us  now  discuss  two  cases  when  specific  structure  on  the  arc  cost  are  present. 

Case  1:  <  pkd •),  Mij  G  E,  k  =  1, . . . ,  K  The  first  stage  cutting  cost  for  each  arc  is  less  than 

or  equal  to  the  cost  of  cutting  in  each  of  the  K  scenarios  in  the  second  stage,  weighted  with 
the  scenario  probability.  Thus,  there  is  no  need  to  cut  in  the  second  stage.  Now,  the  problem 
transforms  into  a  cut  problem,  where  K  terminals  have  to  be  cut  from  a  single  source  s.  This 
problem  can  be  transformed  into  a  (deterministic)  minimum  s  —  t  cut  problem  by  introducing 
a  super  terminal  node  t  which  is  connected  to  each  of  the  k  terminals  with  arc  cost  +oo. 

Case  2:  >  dk-,  Mij  G  E,  k  =  1, . . . ,  K  The  first  stage  arc  cost  is  greater  than  or  equal  to  the 

cost  of  cutting  in  the  second  stage  at  any  of  the  scenarios.  In  this  case,  no  arcs  need  to  be  cut 
in  the  first  stage  and  the  second  stage  problem  decomposes  into  K  independent  (deterministic) 
minimum  s  —  t  cut  problems. 
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Thus,  in  both  cases,  the  two-stage  stochastic  minimum  s— t  cut  problem  is  solvable  in  polynomial 
time.  This  is  summarized  in  the  following  Corollary. 

Corollary  1  The  two-stage  stochastic  minimum  s  —  t  cut  problem  is  solvable  in  strongly  polynomial 
time  for  arbitrary  graphs  if  one  of  the  two  cases  holds  true: 

(Hj  <  min  {pkd!ij }  Vij  G  E  , 

3  _  k=i,...,Kl  lJi 

or 

ca  >  max  { (•/;',  ]  Vi  j  G  E  . 

Allowing  “hedging”  of  arcs,  based  on  (118)-(121),  we  obtain  the  following  formulation  for  the 


two-stage  stochastic  minimum  s  —  t  cut  problem 

I< 

min  E  C  ijUij  E»E4  4  a22) 

ij(zE  k= 1  ij€E 

s.t.  Uij  +  yij  >  Xj  -  xk,  Vij  G  E,  k  =  1, . . . ,  K ;  (123) 

4  =  0,  k  =  l,...,K;  (124) 

4  =  1,  k  =  l,...,K;  (125) 

4,  Vij,  Uij  e  {0. 1}  Vi  G  V,  Vij  G  E,  k=  1, . . . ,  K,  (126) 


where  we  define  variables  u -  similar  to  (117)  as  the  arc  to  be  cut  for  scenario  k  and  variable  4  has 
value  one  if  node  i  €  V  belongs  to  set  Tk,  otherwise  it  has  value  0  and  belongs  to  set  Sk. 

Proposition  1  Formulation  (122)  -  (126)  models  the  two-stage  stochastic  minimum  s  —  t  cut 
problem  correctly. 

Proof:Let  arc  sets  Eq  and  £&  define  a  two-stage  s  —  t  cut  for  graph  G.  Arc  set  E}.  defines  sets 
Sk  and  Tk  for  each  scenario  k.  With  this,  assign  variables  xk  values  0  or  1  accordingly.  The  cut 
variables  y  and  uk  obtain  their  values  according  to  the  arc  sets  Eq  and  E\~.  With  this  assignment, 
equation  (123)  is  satisfied  because  otherwise,  the  sets  Sk  and  Tk  do  not  define  a  cut.  The  objective 
function  value  of  this  cut  is  calculated  correctly. 

It  remains  to  show  that  any  optimal  solution  of  (122)  -  (126)  defines  a  two-stage  s  —  t  cut  for 
graph  G.  Therefore,  assume  that  we  are  given  an  optimal  solution  of  (122)  -  (126)  and  that  the 
graph  is  not  disconnected.  Hence,  for  at  least  one  scenario  k,  there  is  a  (directed)  path  from  root 
s  to  terminal  tk.  Equations  (123)  imply  that  all  variables  xk  for  nodes  i  along  this  path  have  the 
same  value,  which  contradicts  equations  (124)  and  (125).  □ 

The  variables  in  equations  (123)  represent  the  cuts  in  the  first  stage.  As  such,  the  yt] 
variables  connect  the  K  stages.  Thus,  if  variables  yl3  are  fixed  to  0  or  1,  then  problem  (122)  -  (126) 
decomposed  into  K  separate  minimum  s  —  t  cut  problems  of  type  (118)  -  (121).  Thus,  each  of  the 
K  optimization  problems  is  a  linear  program.  This  structure  suggests  that  we  could  develop  of  a 
solution  algorithm  based  on  the  Benders  Decomposition  approach  [101],  where  the  master  problem 
contains  the  y ij  variables  and  the  K  sub-problems  are  minimum  s  —  t  cut  problems  for  trial  values 
yij  obtained  from  the  master  problem. 

Recognize  that  formulation  (122)  -  (126)  does  not  involve  any  variables  X{  for  the  first  stage, 
but  only  for  the  recourse  stage.  Furthermore,  variables  y  and  uk  can  be  relaxed  to  be  non-negative 
continuous  if  variables  xk  are  binary.  In  order  to  see  this,  consider  an  optimal,  fractional  solution 
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for  variables  y  and  uk ,  as  well  as  the  corresponding  binary  solution  values  of  variables  xk  and  xk, 
and  assume  that  all  dkrj  >  0.  This  implies  that  there  must  be  an  arc  ij  G  E  with  yij  =  l  G  (0, 1); 
otherwise,  each  fractional  variable  vkrj  implies  that  dkj  =  0.  Then,  equations  (123)  imply  that 
either  uk-  =  1  —  l  if  xk  —  xk  =  1  or  uk-  =  0  in  all  other  cases.  Considering  all  scenarios  k,  arc 
ij  G  E  contributes  to  the  objective  function  the  value  ctJ  ■  l  +  J2k~i  PkdkjUkj  =  l  ■  Cij  +  (1  —  Z)  • 
^2k-.keKAuk  ^oPkdij,  which  is  a  convex  combination  of  the  two  values  Cij  and  ^2k:k£KAuk^0Pkdkj. 
As  the  solution  defined  by  variables  y ,  uk  and  xk  is  optimal,  we  obtain  that  c.tj  =  J2k-keKAuk  ^Pkdkj- 
Hence,  the  extreme  point  solution  yl3  =  1  and  ukrj  =  0  is  also  an  optimal  solution. 

It  is  natural  to  consider  whether  variables  xk  can  be  relaxed  as  well,  similar  to  the  deterministic 
case.  This  leads  us  to  the  discussion  of  whether  constraint  matrix  (123)  -  (125)  is  totally  unimodular. 
We  show  that  the  constraint  matrix  of  the  two-stage  stochastic  minimum  s  —  t  cut  problem  loses 
its  property  of  being  totally  unimodular  when  extended  from  the  deterministic  case. 

4.2.1  Total  Unimodularity 

In  this  section,  we  review  properties  of  totally  unimodular  (TU)  matrices.  Furthermore,  we  also 
derive  a  new  characterization  of  TU  matrices  that  generalizes  some  of  the  well-known  results  by 
Camion  [30,  31].  This  characterization  is  necessary  for  our  further  discussion  in  Section  4.2.3. 

A  matrix  A  is  totally  unimodidar  (TU),  if  the  determinant  of  each  square  submatrix  of  A  has  the 
value  0,  1,  or  -1.  Recognize  that  a  totally  unimodular  matrix  does  not  need  to  be  square  itself.  From 

the  definition  it  follows  that  any  totally  unimodular  matrix  has  only  {±1,0}  entries.  Therefore,  in 

the  remainder  of  this  section  we  assume  that  A  always  denotes  a  matrix  with  {±1,  0}  entries. 

The  next  theorem  shows  the  importance  of  totally  unimodular  matrices  for  integer  programming. 

Theorem  1  ([149])  Matrix  A  is  totally  unimodidar,  if  and  only  if  of  for  each  integral  vector  b,  set 
{x  G  Mn  :  Ax  <  b}  is  an  integer  polyhedron. 

We  state  two  sufficient  and  necessary  conditions  for  a  matrix  to  be  totally  unimodular,  using  a 
specific  matrix  called  an  Eiderian  matrix. 

Definition  2  ([18])  A  matrix  A  is  said  to  be  Eulerian,  if 

ajj  =  0  mod  2  Vj  ,  (127) 

i 

and 

ajj  =  0  mod  2  Mi  .  (128) 

j 

This  enables  us  to  state  the  following  two  theorems  characterizing  totally  unimodular  matrices. 

Theorem  2  ([30])  Matrix  A  is  totally  unimodular,  if  and  only  if  every  square  Eulerian  submatrix 
is  singular. 

Theorem  3  ([31])  Matrix  A  is  totally  unimodular,  if  and  only  if  every  square  Eulerian  submatrix 
E  satisfies: 

y  eij  =  0  mod  4  . 

bj 
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In  the  next  result,  we  combine  Theorems  2  and  3.  The  difference  to  each  theorem  is  that  it 
suffices  to  check  one  of  the  two  criteria  for  Eulerian  submatrices.  Thus,  one  can  “mix”  the  criterion 
as  needed. 

Theorem  4  Let  8  be  the  set  of  all  square  Eulerian  submatrices  of  matrix  A.  Matrix  A  is  totally 
unimodular,  if  and  only  if 

VE  €  8  :  E  is  singular  or  satisfies  =  0  mod  4  .  (129) 

i,j 

Proof:  “=>”  This  follows  directly  from  Theorems  2  and  3. 

The  proof  of  this  direction  is  along  the  lines  of  cf.  [31,  Theorem  2],  We  need  the  following 
results  from  [31]  and  [144]: 

Statement  1  (due  to  R.  Gomory)  [31].  Let  A  be  a  square  Eulerian  submatrix  of  a  matrix  A, 
such  that  every  proper  submatrix  of  E  is  totally  unimodular,  then  Yli  j  eij  =  det(E)  mod  4. 

Statement  2  (due  to  R.  Gomory)  [144] .  If  A  is  not  TU,  then  A  has  a  submatrix  of  determinant 

±2. 

Statement  3  (due  to  R.  Gomory)  [31].  If  for  every  square  Eulerian  submatrix  E  of  A,  det(E)  =  0, 
then,  for  every  square  submatrix  B  of  A,  det(B)  =  0  mod  2  implies  det(B)  =  0. 

We  prove  the  necessary  result  by  contradiction.  Let  A  be  not  TU.  By  Theorem  2,  there  exists 
a  square  Eulerian  submatrix  E  of  A  which  is  not  singular,  i.e.,  det(E)  ^  0.  Note  that  E  is  not  TU. 
However,  by  (129)  we  know  that  this  matrix  satisfies  Ylijeij  =  0  mod  4. 

Without  loss  of  generality  assume  that  E  is  the  smallest  square  Eulerian  submatrix  of  A  such 
that  det(E)  ^  0.  In  other  words,  every  proper  square  Eulerian  submatrix  E  of  E  is  singular,  i.e., 
det(E)  =  0.  Then  by  Theorem  2,  every  proper  square  Eulerian  submatrix  E  of  E  is  TU,  which, 
due  to  Statement  1,  implies  that  Ylijeij  =  det(E)  mod  4.  Recall  that  Ylijeij  =  0  mod  4.  Thus, 
det(E)  =  0  mod  4.  Then  det(E)  >  4  since  det(E)  /  0. 

Note  that  E  is  not  TU.  Then  by  Statement  2  there  exists  submatrix  B  of  E  such  that  \det(B)\  = 
2.  Observe  that  B  is  a  proper  submatrix  of  E;  therefore,  every  square  Eulerian  submatrix  E  of  B 
satisfies  det(E)  =  0.  Therefore,  Statement  3  can  be  applied  implying  that  det(B)  =  0,  which  results 
in  a  contradiction.  □ 

Denote  by  I  an  identity  matrix  with  an  appropriate  dimension.  I|A  denotes  “gluing”  matrix  A 
to  the  right  of  matrix  I.  We  also  use  the  following  two  results. 

Lemma  1  ([149])  Matrix  I|  A  is  totally  unimodular,  if  and  only  if  A  is  totally  unimodular. 

Lemma  2  Let  A  be  a  matrix  and  B  be  the  matrix  where  one  row  with  exactly  one  entry  with  value 
+1  or  -1  is  added  to  A.  Then,  matrix  B  is  totally  unimodular,  if  and  only  if  A  is  totally  unimodular. 

Proof:Use  the  criterion  in  [1]  and  observe  that  J1  and  J2  is  a  partition  of  the  columns  for  matrix 
A  with  YljGJ1  aL  ~  Sjej2  aij  A  1  for  each  row  i,  if  and  only  if  it  is  for  B.  □ 

4.2.2  General  Case:  Total  Unimodularity  is  Lost 

In  order  to  discuss  totally  unimodularity  of  the  constraint  matrix  (123)  -  (125),  it  suffices  to  consider 
the  matrix  defined  by  (123),  due  to  Lemma  2.  Let  us  re-write  the  constraints  in  the  form  Ax  <  b. 
Therefore,  let  y  be  the  vector  of  variables  ,  uk  be  the  vector  of  variables  ukj  and  xk  be  the  vector 
of  variables  xkj.  Furthermore,  let  B  be  the  arc-node  incidence  matrix  of  graph  G  with  dimension 
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(m  x  n),  let  I  be  the  identity  matrix  (with  appropriate  dimension)  and  0  be  the  matrix  consistent 
of  entries  all  0  (with  appropriate  dimension).  Finally,  let  x  =  (it1, . . . ,  uK ,  y,  x1, . . . ,  xA)  .  This 
enables  us  to  re-write  (123)  as 


/  \ 

2 


( 

0  •  • 

0 

-I 

B 

0 

•  0 

\ 

Axt  = 

0 

-I  : 

0 

-I 

0 

B 

0 

V  0 

0  •  • 

•  -I 

-I 

0 

0 

•  •  B 

/ 
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,K 


X 
X 2 


\  XK  ) 


<  0 


(130) 


With  Lemma  1,  it  suffices  to  consider  matrix 


/  -I 

B 

0 

••  0  \ 

c  = 

-I 

0 

B 

0 

l  -1 

0 

0 

By 

(131) 


Consider  now  the  graph  with  four  nodes  and  four  arcs  shown  in  Figure  3.  Recognize  that  the 
graph  does  not  contain  a  directed  cycle  (but  is  not  a  tree). 


Figure  3:  The  corresponding  matrix  defined  through  constraints  (123)  -  (125)  is  not  totally  uni- 
modular  for  K  >  2. 


For  K  =  2  scenarios,  matrix  C  is  given  as 
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(132) 


where  the  first  four  columns  correspond  to  the  arc  variables  y-,  columns  5  to  8  to  the  node  variables 
xj  for  scenario  1  and  the  last  four  columns  to  variables  xf  for  scenario  two,  respectively.  Matrix  C 
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in  (132)  is  not  totally  unimodular  as  the  square  sub- matrix 
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(2,3) 
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3 
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has  determinant  -2.  Recognize  that  the  square  sub- matrix  E  is  also  Eulerian  but  neither  satisfies 
the  criteria  of  Theorem  2  nor  of  Theorem  3. 


4.2.3  Trees:  Total  Unimodularity  is  Preserved 

In  this  section,  we  consider  the  special  case  that  graph  G  is  a  tree,  i.e.,  the  graph  does  not  contain 
any  (undirected)  circles.  We  provide  two  proofs  that  the  constraint  matrix  of  (123)  -  (125)  is  totally 
unimodular  in  the  case  of  trees. 

The  first  proof  described  in  Section  4.2.3  defines  so-called  walks  in  matrices  to  show  that  the 
criteria  of  Theorem  4  is  satisfied.  We  gain  deep  insights  of  the  structure  of  matrix  C  and  we  can 
identify  the  elements  of  the  tree  needed  for  the  TU  property  of  Theorem  4. 

In  Section  4.2.3,  we  present  an  alternative  proof  that  the  constraint  matrix  of  (123)  -  (125)  is 
TU.  The  proof  relies  on  regular  matroids  and  their  transformations.  As  a  by-product,  we  learn  that 
the  two-stage  stochastic  minimum  s  —  t  cut  problems  for  trees  are  single  commodity  flow  problems 
on  a  transformed  graph.  Furthermore,  in  Section  4.4,  we  discuss  a  linear  time  algorithm  for  the 
two-stage  stochastic  minimum  s  —  t  cut  problem.  All  this  implies  that  the  two-stage  stochastic 
minimum  s  —  t  cut  problem  is  polynomially  solvable  for  trees  -  just  as  the  demand  robust  minimum 
s  —  t  cut  problem  is  [70]. 

In  the  following,  we  consider  only  (directed)  rooted-out  trees,  that  is,  a  directed  graph  where 
each  node  except  the  root  node  has  indegree  1.  The  more  general  case  considering  trees  where 
the  direction  of  the  arcs  does  not  matter  is  not  of  interest  in  the  context  of  two-stage  stochastic 
min-cut  problems.  The  reason  is  that  not  strongly  connected  trees  lead  to  a  0-cost  cut  (in  case 
that  a  terminal  node  is  not  strongly  connected  to  the  root  node).  Recall  that  a  directed  graph 
is  strongly  connected  if  for  each  pair  of  nodes,  there  is  a  directed  path  connecting  them.  If  not 
mentioned  otherwise,  we  mean  by  a  tree  a  (directed)  rooted-out,  tree. 

We  have  already  seen  with  Figure  3  that,  in  general,  the  constraint  matrix  (123)  -  (125)  is 
not  totally  unimodular.  The  underlying  reason  is  that  the  graph  contains  an  undirected  loop. 
Forbidding  undirected  loops  for  graphs  leads  to  trees  -  and  this  property  of  a  tree  is  exactly  what 
we  need  in  order  to  prove  that  the  corresponding  constraint  matrix  is  totally  unimodular. 

Proof  via  Walks 

We  start  with  the  following  lemma,  which  is  true  because  in  a  tree  each  node  has  at  most  one 
predecessor. 

Lemma  3  If  the  graph  G  is  a  tree,  then  any  column  of  the  constraint  matrix  defined  by  (123)  has 
at  most  one  entry  with  value  +1. 

In  this  section,  we  denote  by  E  an  Eulerian  (square)  sub-matrix  of  the  matrix  C  defined  in  (131). 
Let  us  introduce  the  following  notation:  Let  J  be  the  set  of  columns  of  matrix  C,  J  C  J  be  the 
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(a)  Graph  G  (b)  Forest  TD 

Figure  4:  Graph  G  and  the  corresponding  forest  TD  for  the  Eulerian  matrix  E  defined  in  (134) 


columns  of  J  corresponding  to  variable  y  and  let  Jk  C  J  be  the  columns  of  J  corresponding  to 
variable  xk. 

Consider  now  the  tree  G  given  in  Figure  4  (a)  and  assume  that  the  number  of  scenarios  is  greater 
than  or  equal  than  3.  One  square  Eulerian  sub-matrix  E  of  the  corresponding  constraint  matrix  is 
given  by 
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We  can  easily  check  that  the  square  sub-matrix  E  of  C  is  Eulerian.  In  order  to  more  easily  recognize 
the  similarity  of  E  to  the  original  constraint  matrix,  we  include  the  corresponding  column  names. 
The  first  five  columns  of  E  are  from  set  J,  corresponding  to  the  arc  variables  y.  The  rest  of  the 
columns  correspond  to  one  of  the  scenarios  k  =  1, 2,  3  (with  K  >  3)  and  to  the  node  variables  xk . 

Let  E  be  an  Eulerian  sub-matrix  of  C  corresponding  to  directed  graph  G  and  consider  the  case  in 
which  J  ^  0.  Let  j  be  a  column  of  J  with  a  -1  entry  for  a  scenario  k.  Then,  the  corresponding  row 
of  this  -1  entry  is  unique  and  is  denoted  by  i\.  In  the  following  discussion,  we  omit  the  superscript 
k  and  write  i\  instead.  As  E  is  Eulerian,  each  row  sum  has  value  0  mod  2.  Applying  this  logic  to 
row  i\  implies  that  there  is  exactly  one  column  j\  G  Jk  which  has  entry  ±1  in  row  il .  Similarly, 
each  column  sum  has  to  equal  0  mod  2.  This  ensures  that  there  is  an  odd  number  of  entries  ±1  in 
column  j i  other  than  the  entry  ±1  in  row  i\ .  Selecting  one  of  these  values  and  applying  the  same 
argument  for  the  new  row,  we  either  obtain  a  -1  entry  in  a  column  of  J  or  a  ±1  entry  in  one  of  the 
other  columns  of  Jk .  Applying  this  argument  for  the  new  value  will  lead  to  a  new  row  selection. 
This  can  be  repeated  until,  eventually,  we  will  end  up  at  a  -1  entry  of  a  column  in  J  (as  each  row 
sum  is  0  mod  2).  Let  us  call  this  process  of  proceeding  from  one  column  j \  of  J  to  another  column 
j2  of  J  a  “walk”  in  matrix  E  from  column  j\  to  ]2-  Note  that  by  construction  j\  and  j2  belong  to 
the  row  block  of  the  same  scenario. 

With  this  notation,  we  can  state  the  next  lemma  which  holds  true  for  any  directed  graph  G. 

Lemma  4  Let  G  be  a  graph  and  E  be  an  Eulerian  sub-matrix  of  C.  Then,  for  any  k,  E  has  an 
even  number  of  -1  entries  in  columns  of  J  belonging  to  the  row  block  of  scenario  k. 

Proof:If  J  =  0,  there  is  nothing  to  show.  Hence,  assume  J  ^  0  and  consider  a  walk  from  column 
j i  to  j'2  ■  For  this  walk,  mark  all  the  ±1  entries  contained  in  this  walk.  According  to  construction 
of  the  walk,  in  each  row  and  each  column,  we  mark  either  exactly  two  entries  or  none  at  all;  that 
is,  an  even  number  of  entries. 

For  that  same  scenario  k,  pick  any  column  j  G  J  with  an  un-marked  -1  entry.  If  there  is  none, 
then  we  are  done  with  this  scenario.  Now,  assume  that  there  is  such  an  entry. 

Construct  a  “walk”  from  this  column  j  to  any  other,  not  fixed,  column  in  J.  Assume  that  we 
cannot  find  any  such  walk.  This  means  that  we  arrive  at  a  row  or  column  in  which  all  non-zero 
entries  have  been  marked.  However,  this  is  a  contradiction  as  this  would  imply  that  there  is  an 
odd  number  of  non-zero  entries  in  this  particular  row  or  column  (as  we  marked  an  even  number  of 
entries  for  each  row  and  column). 

Finally,  we  can  apply  this  analysis  to  each  scenario  k  separately,  until  all  -1  entries  in  columns 
of  J  have  been  marked.  □ 

Recognize  that  if  there  is  a  walk  from  a  column  j \  G  J  to  j2  €  J  for  any  particular  scenario  k , 
then  this  walk  is  unique,  as  G  is  a  tree.  However,  starting  at  a  particular  column  j  G  J,  we  can 
arrive  at  different  columns  of  J .  This  is  due  to  (possible)  multiple  choices  for  selecting  a  value  in  a 
column  (for  a  fixed  row). 

Furthermore,  recognize  that  the  number  of  columns  J  of  matrix  E  can  be  odd;  see  for  instance 
the  Eulerian  matrix  E  defined  in  (134).  Notice  that  Lemma  4  does  not  require  the  Eulerian  matrix 
to  be  square. 

With  the  help  of  Lemma  4,  we  are  able  to  prove  the  following  results,  which  seem  to  be  surprising 
at  the  first  glance. 

Lemma  5  Let  G  be  a  graph  and  E  be  an  Eiderian  sub-matrix  of  C.  Then,  if  J  ^  0  and  none  of 
the  columns  and  rows  of  E  is  the  0  vector,  then  matrix  E  is  square,  if  and  only  if  the  sum  of  each 
element  in  column  j  G  J  has  value  -2;  i.e.,  in  each  column  j  G  J ,  there  are  exactly  two  non-zero 
entries  (both  having  value  -1). 
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Proof:Let  us  first  prove  direction  “<£=”.  Therefore,  consider  a  walk  in  E  from  a  column  j\  G  J  to 
j2  €  J  and  “mark”  the  rows  and  columns  used  in  this  walk.  Assume  that  both  j\  and  J2  are  not 
“marked”  initially.  Then  it  is  easy  to  observe  that  the  number  of  rows  covered  by  this  walk  is  2  +  £, 
with  i  being  non-negative  and  integral.  Then,  the  number  of  columns  for  this  walk  is  2  (for  the 
columns  in  J)  +  1  +  l.  Hence,  the  number  of  “marked”  columns  is  one  more  than  the  number  of 
“marked”  rows. 

Now,  consider  another  walk  in  E  from  “marked”  column  j \  G  J  (but  not  “marked”  row)  to  one 
arbitrary  column  j  G  J.  In  the  case  that  j  =  j2,  we  “mark”  in  a  new  walk  additionally  2  +  m  new 
rows  but  only  1  +  m  columns  (as  the  columns  in  J  have  been  already  “marked”)  for  some  non¬ 
negative  and  integer  m.  Hence,  the  number  of  “marked”  columns  is  one  less(!)  than  the  number 
of  “marked”  rows.  In  the  other  case  in  which  j  ^  (i.e.,  the  final  column  has  not  been  “marked” 

yet),  we  cover  with  “marks”  additionally  2  +  m  rows  and  2  +  m  columns. 

We  can  construct  a  set  of  walks  for  each  scenario  which  are  row  distinct  (through  the  argument 
with  the  “marks” )  and  cover  all  rows  of  matrix  E  (recognize  that  E  contains  no  row  or  column  having 
only  0  entries).  Simply  speaking  every  iteration  of  the  “marking”  procedure  (by  iteration  we  imply 
construction  of  a  new  walk)  has  three  possible  outcomes:  (i)  the  number  of  additional  “marked” 
columns  and  rows  is  the  same,  i.e.,  exactly  one  column  from  J  is  encountered  (and  “marked”)  for 
the  first  time  in  a  new  walk;  (ii)  the  number  of  additional  “marked”  columns  is  one  less  than  the 
number  of  new  “marked”  rows,  i.e.,  no  new  columns  from  J  are  “marked”;  (iii)  the  number  of 
additional  “marked”  columns  is  one  more  than  the  number  of  new  “marked”  rows,  i.e.,  exactly  two 
new  columns  from  J  are  “marked.” 

Let  x\,  X2  and  X3  be  the  numbers  of  times  outcomes  (i),  (ii)  and  (iii)  occurred,  respectively. 
During  the  procedure  the  number  of  new  columns  from  J  encountered  is  x\  +  2x3;  the  number  of 
“marked”  columns  from  J  encountered  is  xi  +  2x2.  Since  each  column  of  J  has  exactly  two  -1  entries 
in  E,  then  each  column  is  encountered  exactly  twice:  once  as  a  new  one  and  the  other  time  as  a 
“marked”  one.  Therefore,  xi  +  2x2  =  xi  +  2x3  and  X2  =  X3.  It  immediately  implies  that  for  every 
outcome  (ii)  that  increases  the  difference  between  the  numbers  of  “marked”  rows  and  columns  by 
one,  there  exists  exactly  one  outcome  (iii)  that  decreases  this  difference  by  one.  Therefore,  at  the 
end  of  the  procedure  the  number  of  “marked”  columns  must  be  equal  to  the  number  of  “marked” 
rows  and  matrix  E  has  to  be  square. 

For  the  other  direction  assume  that  there  exists  a  column  in  J  that  has  more  that  two  -1 

entries  in  E.  With  the  calculations  above,  it  is  easy  to  observe  that  during  the  procedure  the  number 
of  “marked”  columns  from  J  encountered  should  be  greater  than  the  number  of  new  columns  from 
J  encountered,  i.e.,  xi  +  2x2  >  xi  +  2x3.  Therefore,  X2  >  X3  that  implies  that  at  the  end  of  the 
procedure  the  number  of  “marked”  columns  must  be  less  than  the  number  of  “marked”  rows  and 
matrix  E  is  not  square.  □ 

Now,  let  us  define  a  special  tree  structure  resulting  from  an  Eulerian  sub-matrix  E  of  C  (131). 
For  the  tree  construction,  we  assume  that  J  /  0.  The  tree  TD  is  then  constructed  via  procedure 
CT ;  see  below. 

Applying  Procedure  CT  to  the  graph  G  of  Figure  4  (a)  for  the  constraint  matrix  (134)  leads 
to  the  forest  TD  shown  in  Figure  4  (b).  Recognize  that  graph  TD  is  disconnected  and  that  all  leaf 
nodes  are  dummy  nodes.  Furthermore,  any  dummy  node  is  either  a  leaf  node  or  the  root  node 
of  a  particular  tree  in  the  forest  TD.  Both  observations  hold  true  in  general;  however,  we  do  not 
provide  a  proof  here  as  we  do  not  require  this  property  in  the  following  discussion,  even  though  the 
proof  follows  immediately  from  Lemma  6  together  with  Lemma  5.  Notice  that  not  all  nodes  of  the 
original  graph  are  included  in  the  forest  TD. 

The  following  lemma  summarizes  the  properties  of  graph  TD. 
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Procedure  Construct  Graph  TD  (CT) 

1.  For  each  row  ik  of  matrix  E,  one  of  the  following  cases  (for  the  columns  in  Jk )  holds: 

1.1  if  there  is  a  -1  and  a  +1  entry  in  ik  other  than  in  columns  of  J: 
include  the  corresponding  arc  in  TD  (along  with  the  two  end  nodes) 

1.2  if  there  is  only  a  +1  entry  in  ik: 

the  corresponding  column  with  entry  +1  corresponds  to  a  node  j  in  the  original  graph 
G.  Let  i  be  the  predecessor  of  node  j  in  graph  G.  Then,  include  the  dummy  node  d* 
and  node  i  in  TD  along  with  the  arc  (d*,  j) 

1.3  if  there  is  a  -1  entry  but  not  a  +1  entry  in  row  ik: 

the  corresponding  column  with  entry  -1  corresponds  to  a  node  i  in  the  original  graph  G. 
Let  i  be  the  predecessor  of  node  j  in  graph  G.  Then,  include  the  dummy  node  d*  and 
node  i  in  TD  along  with  the  arc  (i,  d1-) 

1.4  if  there  are  no  non-zero  entries  in  ik: 
ignore  this  row 

End  procedure 


Lemma  6  Graph  TD  has  the  following  properties: 

1.  Each  node  in  the  graph  is  either  a  dummy  node  or  corresponds  to  a  node  in  the  original  graph 
G. 

2.  Td  is  a  (directed)  forest. 

3.  For  each  walk  in  matrix  E,  there  is  an  (undirected)  path  in  TD  between  two  dummy  nodes 
corresponding  to  this  walk. 

4-  For  each  dummy  node  di  in  TD,  there  is  at  least  one  other  dummy  node  dj  in  TD  which  is 
connected  in  TD ;  however,  they  are  not  necessarily  strongly  connected. 

5.  Any  undirected  path  in  TD  from  one  dummy  node  dm  to  another  dummy  node  dn  is  of  one  of 
the  three  following  types: 

(a)  path  is  only  up  the  tree;  i.e.,  if  the  path  goes  from  node  i  to  j,  then  j  is  a  predecessor  of 
node  i  in  TD, 

(b)  path  is  down  the  tree;  i.e.,  if  the  path  goes  from  node  i  to  j,  then  j  is  a  successor  of 
node  i  in  TD, 

(c)  path  is  first  up  and  then  down  the  tree. 

Proof:Property  1  is  immediate. 

For  property  2,  we  note  that  each  dummy  node  corresponds  to  an  arc  in  the  original  graph;  that 
is,  the  original  arc  is  replaced  by  a  dummy  node  and  one  or  two  arcs.  Hence,  the  tree  structure  of 
G  is  preserved  in  the  construction  of  TD.  However,  the  connectivity  property  might  be  lost,  leading 
to  a  forest. 

Consider  any  walk  from  a  column  j\  £  ,J  to  £  J  for  a  scenario  k.  According  to  construction 
of  Td,  the  columns  j\  and  j2  correspond  to  dummy  nodes  in  TD.  In  the  walk,  proceeding  from 
one  column  jk  €  Jk  to  another  column  jk  £  Jk  corresponds  to  arc  (ji ,  J2)  hi  TD  (when  using 
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appropriate  labeling  of  the  columns  in  E).  Hence,  any  walk  in  E  corresponds  to  a  path  in  TD 
between  two  dummy  nodes. 

Property  4  follows  from  property  3. 

Property  5  is  immediate  when  using  the  facts  that  TD  is  a  forest  along  with  Lemma  3.  □ 

Recognize  that,  in  general,  not  every  path  in  TD  between  two  dummy  nodes  corresponds  to  a 
walk  in  E;  that  is,  property  3  of  Lemma  6  is  not  a  1-to-l  correspondence  between  walks  in  E  and 
paths  in  TD. 

Now,  we  are  ready  to  prove  the  main  theorem. 

Theorem  5  If  the  graph  G  is  a  tree,  then  the  constraint  matrix  defined  by  (123)  -  (125)  is  totally 
unimodular. 

Proof:With  Lemma  1,  the  constraint  matrix  (123)  -  (125)  is  totally  unimodluar,  if  and  only  if 
matrix  C  is  totally  unimodular. 

In  order  to  prove  that  C  is  totally  unimodular,  we  must  show  that  any  (square)  Eulerian  sub¬ 
matrix  is  singular  or  that  the  sum  of  all  entries  is  0  mod  4,  using  Theorem  4.  Therefore,  let  E  be 
any  square  Eulerian  submatrix  of  C. 

We  use  the  notation  for  J,  J  C  J  and  Jk  C  J  as  introduced  above. 

Assume  that  E  does  not  contain  any  column  of  J.  In  this  case,  E  is  singular,  as  matrix  C 
without  the  first  m  columns  is  totally  unimodular.  Therefore,  assume  that  E  contains  at  least  one 
column  of  J  and  E  does  not  contain  any  column  or  row  consisting  only  of  0  entries  (otherwise,  E  is 
singular) . 

Observe  that  each  row  sum  of  E  has  either  value  —2  or  0  (due  to  the  network  structure  of  B); 
in  particular,  the  row  sum  of  E  is  not  +2. 

We  have  to  show  that  there  is  always  an  even  number  of  rows  having  row  sum  equal  to  —2, 
implying  that  the  sum  of  all  entries  is  0  mod  4. 

Therefore,  construct  the  forest  TD  for  E  with  procedure  CT. 

Let  us  identify  the  row  sums  (mod  4)  of  any  “walk”  in  E  from  j\  £  J  to  £  J  for  scenario  k 
and  let  us  name  this  sum  the  “walk  sum.”  Furthermore,  we  call  the  corresponding  row  with  entry 
-1  for  column  j]  the  start  row  of  the  walk,  the  corresponding  row  with  entry  -1  for  column  j2  the 
end  row  and  all  other  rows  visited  by  the  walk  intermediate  rows.  The  start  row  and  the  end 
row  of  any  walk  have  either  sum  —2  or  0.  All  intermediate  rows  have  sum  0,  as  they  contain  exactly 
one  +1  and  one  -1  entry.  Hence,  the  sum  of  all  entries  in  a  walk  in  E  has  either  value  0,  -2,  or  -4. 

Let  us  examine  the  forest  TD.  We  already  have  the  connection  of  a  walk  in  E  to  a  path  in  TD 
via  property  3  of  Lemma  6.  Now,  let  us  assign  the  values  of  the  three  possible  walks  identified  in 
property  5  of  Lemma  6: 

1.  path  is  only  up  the  tree:  this  means  that  the  start  row  sum  is  -2  and  the  end  row  sum  is  0. 
The  corresponding  walk  in  E  has  value  -2  mod  4. 

2.  path  is  down  the  tree:  this  means  that  the  start  row  sum  is  0  and  end  row  sum  is  -2.  The 
corresponding  walk  in  E  has  value  -2  mod  4, 

3.  path  is  first  up  and  then  down  the  tree:  this  means  that  the  start  row  sum  is  —2  and  the 
end  row  sum  is  —2  as  well.  The  corresponding  walk  in  E  has  value  0  mod  4. 

Through  the  proof  of  Lemma  4,  we  know  that  we  can  always  construct  a  set  of  walks,  which  are 
row  distinct,  for  matrix  E  which  have  each  of  the  -1  entries  of  the  columns  in  J  as  either  a  start  or 
an  end  row  (this  is  due  to  the  “marking”  argument  in  the  proof).  Row  distinct  means  in  particular 
that  none  of  the  start  and  end  rows  are  the  same  for  any  two  walks  in  this  set. 
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If  we  can  show  that  the  row  sum  of  all  walks  in  this  set  of  walks  is  0  mod  4,  then  the  proof 
is  complete.  Or,  equivalently,  one  can  show  that  for  the  paths  corresponding  to  the  set  of  walks 
going  up  the  tree  at  a  path  (between  two  dummy  nodes),  implies  going  down  the  tree  along  a  path 
(between  two  dummy  nodes).  We  will  show  the  latter. 

Using  Lemma  5,  we  know  that  each  dummy  node  d  £  TD  has  to  be  visited  exactly  two  times  by 
paths,  corresponding  to  walks  of  the  set  of  walks. 

Let  us  denote  by  a  loop  a  set  of  paths  in  TD,  where  each  path  is  between  two  dummy  nodes, 
the  end  node  of  one  path  is  the  start  node  of  another  path,  and  starting  at  one  particular  dummy 
node  d,  following  the  paths  will  lead  back  to  dummy  node  d.  These  paths  neither  have  to  be  arc 
nor  node  distinct. 

Now,  we  know  that  the  number  of  visits  per  dummy  node  is  two  and  that  there  exists  such  a 
collection  of  paths  which  visit  all  the  dummy  nodes,  resulting  from  the  set  of  walks.  This  implies 
that  we  can  group  this  set  of  paths  into  loops  in  the  forest  TD. 

Therefore,  let  us  consider  one  such  loop.  If  the  loop  contains  only  paths  first  up  and  then  down 
the  tree,  then  the  corresponding  walk  sum  is  0  and  there  is  nothing  to  show.  Otherwise,  let  this 
loop  contain  one  path  going  only  up  the  tree  (or  only  down  the  tree).  However,  as  TD  consists  of 
trees  and  the  loop  is  closed,  we  eventually  have  to  go  down  the  tree  whenever  we  go  up  the  tree. 
Hence,  the  number  of  paths  going  only  up  equals  the  number  of  path  going  only  down,  leading  to 
a  loop  sum  of  0  mod  4. 

This  concludes  the  proof.  □ 

Recognize  that  for  the  proof  of  Theorem  5,  we  used  Lemma  3  (and  hence  the  tree  property) 
indirectly. 


Second  Proof  via  Matroids 


In  this  section,  we  provide  a  short  (but  much  more  involved)  proof  of  Theorem  5  via  Matroid  theory. 
An  excellent  overview  of  Matroids  is  given  in  the  book  [147];  particularly,  we  are  using  Chapters  9 
and  11  in  this  section. 

We  restrict  ourselves  again  to  directed,  connected  graphs  G,  which  are  out-rooted  trees.  Recall 
the  following  notation  adapted  to  the  tree  G : 

•  B  -  arc-node  incidence  matrix  of  graph  G  with  dimension  (n  —  1  x  n) 

•  Bt  -  node- arc  incidence  matrix  of  graph  G  with  dimension  (n  x  n  -  1) 


•  I  -  identity  matrix  (with  appropriate  dimension) 


•  0  -  matrix  consistent  of  entries  all  0  (with  appropriate  dimension) 


•  C  -  matrix,  as  defined  in  (131). 


Consider  the  following  matrix 


A  = 


/  Bt  0  •  •  •  0  \ 

0  BT  :  0 

0  0  •  •  •  Bt 

\  -I  -I  •••  -I  ) 


(135) 


where  I  has  dimension  (n  —  1  x  n  —  1).  By  re- numerating  the  columns,  matrix  AT  is  the  matrix  of 
our  interest:  C.  Thus,  A  is  TU,  if  and  only  if  C  is  TU. 
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With  Lemma  1,  we  can  consider 


,(136) 


where  each  of  the  K  identity  matrices  added  to  the  blocks  BT  has  dimension  (n  x  n)  and  the 
additional  identity  matrix  corresponding  to  the  last  block  of  —I  matrices  has  dimension  (n— 1  xn— 1). 

In  order  to  show  that  matrix  A  |  II  is  TU,  we  use  specific  transformations  which  do  not  alter 
the  TU  property  and  transform  A  1 1  into  a  node-arc  incidence  matrix  of  some  graph  G.  More 
specifically,  we  use  the  following 


Proposition  2  ([142])  Let  H  be  a  node-arc  incidence  matrix  of  a  graph  and  F  be  a  basis.  Then 
matrix  F~lH  is  TU. 


Without  loss  of  generality,  the  last  row  of  BT  corresponds  to  a  leaf  node  of  the  tree.  Let  F  be  the 
matrix  derived  from  Br  by  deletion  of  that  last  row.  Thus,  F  is  square  of  dimension  (n-lxn-l), 
nonsingular  (Le.,  invertible)  and  TU  (as  a  row  with  exactly  one  non-zero  entry  has  been  removed). 
Furthermore,  F  is  a  basis  of  Br.  This  is  the  step  where  we  exploit  the  assumption  that  G  is  a 
connected  tree. 

In  matrix  G,  modify  the  last  row  with  the  —I  |  0  matrices  by  adding  multiples  of  the  above  rows. 
Specifically,  premultiply  the  rows  corresponding  to  the  rows  present  in  matrix  F  of  each  BT  1 1  with 
F~x  and  then  add  the  resulting  matrix  to  the  last  row  with  the  —I  |  0  entries.  This  results  in  the 
modification  of  the  latter  row. 

Matrix  G  now  reads 


Now  multiply  the  last  row  with  — F.  This  produces  an  overall  matrix  as  follows: 


(137) 


(138) 


Matrix  G  is  TU,  because  in  each  column  with  two  non-zero  entries,  the  sum  of  all  entries  with 
two  nonzero  entries  amount  to  0;  cf.  [107,  Proposition  2.6].  This  completes  the  proof  of  Theorem  5. 

Note:  By  removing  the  one  column  of  matrix  F  which  has  exactly  one  non-zero  entry,  the 
resulting  matrix  G  is  the  node-arc  incidence  matrix  of  some  graph,  as  each  column  contains  exactly 
one  +1  and  one  —1  entry. 
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This  matrix  transformation  from  (135)  to  (138)  is  a  particular  case  of  the  single  commodity 
representation  produced  by  the  multicommodity  network  transformation  described  in  [142],  where 
we  mainly  use  the  step  3  b)  of  the  transformation  algorithm  in  [142], 

The  presented  proof  provides  us  with  a  different  insight  into  two-stage  minimum  s  —  t  cut 
problems  for  trees:  The  two-stage  minimum  s  —  t  cut  problem  for  a  tree  G  is  equivalent  to  a  single 
commodity  flow  problem  for  the  graph  represented  by  G. 

4.3  Computational  Complexity 

The  fact  that  the  constraint  matrix  of  the  two-stage  stochastic  minimum  s—t  cut  problem  loses  its  to¬ 
tal  unimodularity  raises  the  question  about  the  theoretical  computational  complexity  of  the  problem. 
In  this  section,  we  show  that  the  stochastic  programming  extension  of  the  polynomially  solvable  s  —  t 
minimum  cut  problem  becomes  VP-hard  in  general.  This  is  consistent  with  similar  observations  for 
the  two-stage  stochastic  extensions  of  the  minimum  spanning  tree  [61]  and  maximum  weight  match¬ 
ing  [90]  problems.  In  this  section  for  simplicity  of  further  discussion  we  consider  the  undirected  ver¬ 
sion  of  the  problem.  Define  the  decision  version  of  two-stage  stochastic  s  —  t  cut  problem  as  follows: 

Definition  3  (Decision  Version) 

Instance:  A  graph  G  =  (V,  E )  with  node  set  V,  arc  set  E,  root  s  G  V,  and  K  scenarios.  The  k-th 
scenario  consists  of  a  single  terminal  tk  and  has  probability  pk  of  being  realized.  Arc  ij  G  E  has  cost 
Cij  in  the  first  stage  and  dC  in  the  recourse  stage  (or  second  stage)  if  the  k-th  scenario  is  realized. 
Question:  Is  there  a  set  of  arcs  Eq  to  be  cut  in  the  first  stage  and  for  each  scenario  k,  an  arc 
set  Ek  to  be  cut  in  the  recourse  stage  if  scenario  k  is  realized,  such  that  removing  Eq  U  E f.  from 
the  graph  G  disconnects  s  from  the  terminal  tk,  while  the  expected  cost  of  cutting  c  :=  YlijeE0  cb  + 
Yhk=iPk  12ijeEk  dij  over  a M  scenarios  does  not  exceed  C ,  i.e. ,  c  <  C? 

We  call  the  arc  set  Eq  U  E^  a  feasible  cut  for  scenario  k,  if  the  removal  of  Eq  U  E f  from  the 
graph  G  disconnects  s  from  the  terminal  tk .  In  our  reduction,  we  use  the  Multiterminal  Cut  (MC) 
problem. 

Definition  4  (Multiterminal  Cut)  [42] 

Instance:  A  graph  G(V,  E),  a  set  of  S  =  {si,  s 2,  •  •  • ,  5K}  C  V  of  k  specified  vertices  or  terminals, 
and  a  positive  weight  w{e)  for  each  arc  e  G  E  and  a  bound  B. 

Question:  Is  there  a  subset  of  arcs  E  C  E  with  w(E)  <  B  such  that  the  removal  of  E  from  E 
disconnects  each  terminal  from  all  others? 

An  arc  set  E  C  E  is  called  a  feasible  cut  for  MC,  if  the  removal  of  E  from  E  disconnects  each 
terminal  from  all  others.  We  need  the  following  complexity  result. 

Theorem  6  ([42])  The  Multiterminal  Cut  problem  for  k  =  3  and  arbitrary  graphs  is  NP -complete 
even  if  all  edge  weights  are  equal  to  1. 

Note  that  for  k  =  2  the  MC  problem  reduces  to  the  standard  minimum  s  —  t  cut  problem  which 
can  be  solved  in  polynomial  time. 

We  now  establish  the  strongly  VP-completeness  of  the  two-stage  stochastic  s  —  t  cut  problem 
by  reducing  the  MC  problem  with  u  =  3  to  it.  We  are  given  an  instance  of  the  MC  problem 
with  G  =  ( V,E ),  respective  weights  Wij  for  each  ij  G  E,  S  =  {-31,52,53}  and  bound  B.  Without 
loss  of  generality,  assume  that  arcs  S1S2  and  S1S3  do  not  exist  in  G ,  i.e.,  siS2,siS3  ^  E.  Next  we 
construct  an  instance  of  the  two-stage  stochastic  s  —  t  cut  problem  such  that  there  is  a  one-to-one 
correspondence  of  their  respective  solutions.  Define  Gmc  =  (V,E)  as  follows: 
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•  Let  V  =  V  U  {i}  and  E  =  E  U  {S1S2,  S1S3,  S2L  s^t}.  In  other  words,  we  add  an  extra  node  and 
four  additional  arcs  into  the  original  graph. 

•  First-stage  arc  capacities:  ctJ  =  wtj  V  ij  £  E  and  ct]  =  +00  for  ij  £  {S1S2,  S1S3,  S2L  S3#}. 

•  Second-stage  arc  capacities:  d}-  =  d'f^  =  +00  V  ij  £  E\  d\lS2  =  d\3t  =  d2lS3  =  d2S2t  =  +00  and 

rt 1  =  rjl  =  rl2  =  rl2  =  D 
U'SiSs  UjS2t  USlS2  ^$3 1  yj‘ 

•  Let  the  source  node  s  be  si  and  the  sink  node  be  t  for  both  scenarios. 

•  Let  the  probability  of  each  scenario  be  given  by  pi  =  p2  =  1/2. 

An  example  and  the  corresponding  transformation  is  shown  in  Figure  5.  The  MC  instance  in 
Figure  5(a)  is  a  “YES”  instance  for  B  >  10,  with  the  feasible  cut  set  E  =  {S2S3,  S24,  S34,  S35}.  This 
set  is  marked  by  the  gray  lines.  Figure  5(b)  shows  the  transformed  graph  Gmc  with  a  cut  having 
weight  11. 

Lemma  7  An  instance  for  the  Multiterminal  Cut  problem  with  bound  B  is  a  “YES”  instance,  if 
and  only  if  the  transformed  graph  Gmc  Is  a  “YES”  instance  for  the  two-stage  stochastic  s  —  t  cut 
problem  with  cost  bound  C  =  B. 

Proof: “=>■”  Let  the  MC  problem  have  the  feasible  cut  set  E  with  w  :  =  w{E)  <  B.  The  first  stage 
cuts  for  Gmc  are  E  as  well,  for  the  first  scenario,  arcs  (S1S3)  and  (s2t)  are  cut  in  the  second  stage 
while  arcs  (S1S2)  and  (s3t)  are  cut  in  the  second  stage  for  scenario  two.  In  this  way,  the  cut  weight 
is  c  =  w.  This  leads  to  a  feasible  cut  for  Gmc  for  the  two-stage  s  —  t  cut  problem  for  both  scenarios 
with  weight  c  <  C  =  B . 

“<=”  Suppose  the  solution  of  the  constructed  two-stage  stochastic  minimum  s  —  t  cut  problem 
is  given  by  the  arc  set  Eq  for  the  first  stage  and  arc  sets  E\  and  E2  for  scenarios  1  and  2  in  the 
second  stage,  respectively,  and  cut  weight  C.  Because  C  <  00,  any  feasible  two-stage  s  —  t  cut  has 
finite  total  capacity.  We  can  observe  the  following: 

•  We  need  Eq  C  E  since  ctJ  =  +00  for  any  ij  £  E\E. 

•  We  may  assume  that  E\  =  {5153,52^)}  and  E2  =  {siS2,S3t}  since  the  respective  capacities 
are  zero,  while  all  other  capacities  are  +00. 

•  For  scenario  1,  since  d] lS2  =  d\3t  =  +00,  arcs  S1S2  and  S3t  are  contained  in  the  graph. 
Therefore,  Eq  must  contain  arcs  that  completely  disconnect  si  from  S3  and  S2  from  S3. 

•  For  scenario  2,  since  d2slS3  =  d?S2t  =  +00,  arcs  S1S3  and  S2t  are  contained  in  the  graph. 
Therefore,  Eq  must  contain  arcs  that  completely  disconnect  si  from  S2  and  S3  from  S2- 

Therefore,  any  two-stage  stochastic  s  —  t  cut  with  the  finite  total  capacity  must  contain  Eq  that 
completely  disconnects  si,  S2  and  S3  from  each  other,  i.e.,  Eq  is  a  multiterminal  cut  in  the  original 
graph  G(V,E).  Moreover,  since  the  capacities  of  arcs  in  E\  and  E2  are  zero,  then  minimizing  the 
total  capacity  of  the  two-stage  s  —  t  cut  corresponds  to  minimizing  the  weight  of  the  multiterminal 
cut.  Thus,  Eq  is  a  feasible  cut  for  MC  with  weight  B  =  C.  □ 

This  allows  us  to  prove  the  main  result. 

Theorem  7  The  decision  version  of  the  two-stage  stochastic  s  —  t  cut  problem  is  NP-complete  in 
the  strong  sense  even  for  two  scenarios  and  the  same  terminal  node  for  both  scenarios. 
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(a)  Instance  for  MC  problem  with  cut. 


(l,+oo, +oo) 

(b)  Corresponding  graph  Gmc  with  cut. 


Figure  5:  MC  instance  and  corresponding  instance  Gmc  f°r  the  two-stage  stochastic  minimum  s  —  t 
cut  problem.  The  legend  for  (b)  is  the  same  as  in  Figures  2  and  6,  but  rather  for  undirected  arcs. 
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Proof:The  two-stage  stochastic  s  —  t  cut  problem  is  in  NP ,  as  a  non-deterministic  algorithm  needs 
only  to  guess  the  arcs  to  be  cut  and  check  if  this  leads  to  a  feasible  cut  for  each  scenario  and  if  the 
cost  of  the  cut  is  less  than  or  equal  to  C. 

The  given  transformation  from  MC  to  the  two-stage  stochastic  s  —  t  cut  problem  via  graph  Gmc 
is  valid  according  to  Lemma  7.  By  replacing  the  weights  +oo  by  B  +  1,  the  node  set,  the  arc  set 
and  the  arc  weights  of  the  constructed  graph  Gmc  are  linearly  bounded  in  the  input  size  of  MC. 
Thus,  the  transformation  can  be  done  in  (strongly)  polynomial  time.  □ 

Remark  1  The  directed  version  of  the  two-stage  stochastic  s  —  t  cut  problem  is  also  N P -complete 
in  the  strong  sense.  To  see  this,  one  can  use  the  same  reduction  as  for  the  undirected  case  by 
preserving  the  direction  of  the  arcs  to  be  cut  in  the  construction  presented  in  the  proof  of  Lemma  7. 

4.4  Linear  Running  Time  Algorithm  for  Trees 

As  we  discuss  in  Section  4.2.3,  if  graph  G  is  a  tree,  then  the  constraint  matrix  (123)  -  (125)  is 
totally  unimodular.  This  fact  indicates  that  the  two-stage  stochastic  minimum  s  —  t  cut  problem  is 
polynomially  solvable  if  G  is  tree.  Moreover,  we  show  next  that  the  problem  admits  a  linear  time 
solution  algorithm. 

Consider  now  the  following  transformation.  Given  an  instance  of  the  two-stage  stochastic  min¬ 
imum  s  —  t  cut  problem  on  graph  G  =  (V,E)  with  the  notation  of  Definition  1,  construct  a  graph 
G  =  (V ,  E)  with  arc  weight  function  w  :  E  — >  M+  as  follows: 

•  Add  one  additional  node  Tk  to  V  for  each  scenario  k;  i.e.,  V  :=  V\J^=1{Tk}. 

•  Add  one  arc  between  terminal  tk  and  TK  to  E  for  each  scenario  k;  i.e.,  E  =  E{J1f=1{tkTk}. 

•  Weights  for  e  G  E  are  the  first  stage  cost;  i.e.,  w(ij )  =  Vij  £  E. 

•  Weights  for  arcs  tkTK  are  constructed  as  follows.  Let  Pk  be  the  (unique)  path  from  the  source 
s  to  tk,  and  ek  be  one  least  cost  arc  in  scenario  k  (along  this  path  P *,).  Then  w(tkTk )  = 
pkdk(ek)  \/k  =  1, . . . ,  K . 

Graph  G  remains  a  tree  by  construction.  Now,  finding  a  minimum  cut  in  G  which  separates  s 
from  all  terminals  Tk  can  be  done  via  a  linear  time  dynamic  programming  algorithm.  Any  such  cut 
in  G  corresponds  then  to  a  cut  in  G  (and  vise  versa)  with  the  same  cost  as  follows:  If  arc  e  £  E  for 
G  is  cut,  then  in  the  first  stage,  e  is  cut  in  G;  if  arc  e  =  tkTK  £  E  \  E  is  cut,  then  in  the  fc-scenario, 
arc  ek  is  cut.  This  proves  the  following  Corollary: 

Corollary  2  If  graph  G  is  a  tree,  then  the  two-stage  stochastic  minimum  s  —  t  cut  problem  can  be 
solved  in  linear  time. 

Adjusting  the  formulation  of  the  deterministic  minimum  s  —  t  cut  problem  (118)  -  (121)  to  the 
cut  problem  for  graph  G,  one  obtains: 


min  E  w(ij)yij  (139) 

ij&E 

s.t.  yij  >  Xj  —  Xi  Vij  €  E  (140) 

xs  =  0,  xtk  =  I  Vfc  =  1, . . . ,  K  (141) 

Xi,  yij  €  [0, 1]  Vi  €  V,ij  <5  E.  (142) 


The  constraint  matrix  defined  by  (140)  -  (141)  is  TU.  However,  this  does  not  (at  least  in  an 
obvious  manner)  imply  that  C  is  TU  as  well  for  the  tree  G. 
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4.5  Node-Based  Version 


We  define  two  mathematical  programming  formulations  as  equivalent ,  if  and  only  if  both  formula¬ 
tions  have  the  same  set  of  feasible  and  optimal  solutions.  It  is  well-known  that  the  minimum  s  —  t 
cut  problem  can  be  equivalently  reformulated  as  the  following  quadratic  0-1  program  [25]: 


min  ^2  Cij{  1  —  Xi)xj 
ij&E 

s.t.  xs  =  0,  xt  =  1 

Xi  G  {0, 1}  Vi  <G  V. 

We  also  may  consider  the  relaxed  concave  quadratic  programming  problem. 


(143) 

(144) 

(145) 


min  E  CijO-  -Xi)xj  (146) 

ijeE 

s.t.  xs  =  0,  xt  =  1  (147) 

0  <  xi  <  1  Vi  €  V.  (148) 

For  a  concave  minimization  problem  over  a  bounded  polytope,  there  always  is  an  optimal  solution 
at  a  corner  point  of  the  polytope.  Thus,  formulations  (143)  -  (145)  and  (146)  -  (148)  are  equivalent. 

Due  to  the  nonlinearity  of  the  objective  function  in  (143),  the  resulting  formulation  (143)-(145) 
defines  the  minimum  s  —  t  cut  problem  without  variables  ytj  that  appear  in  (118)-(121).  As  we 
discuss  in  Section  4.1,  variables  ytj  have  a  clear  interpretation  in  terms  of  the  cutset  that  can  be  also 
used  for  the  equivalent  definition  of  the  deterministic  minimum  s  —  t  cut  problem.  This  observation 
leads  to  an  alternative  definition  of  the  two-stage  stochastic  minimum  s  —  t  cut  problem. 

Definition  5  (two-stage  stochastic  minimum  s  —  t  cut;  node-based  version)  Given  is  a  di¬ 
rected  graph  G  =  (V,  E)  with  node  set  V  and  arc  set  E  and  a  root  s£f.  There  are  K  scenarios. 
The  kth  scenario  consists  of  a  single  terminal  tk  and  has  probability  pk  of  being  realized.  Arc  ij  G  E 
has  cost  Cij  in  the  first  stage  and  dfi  in  the  recourse  stage  (or  second  stage)  if  the  kth  scenario  is 
realized.  The  task  is  to  find  two  node  sets  S,  T  C  V  and  for  each  scenario  k,  additional  two  node 
sets  Sk,  Tk  C  V  with  s  G  S  U  Sk ,  t  G  T  U  Tk  and  S  U  Sk  U  T  U  Tk  =  V  where  S,  T,  Sk,  Tk  are 
mutually  distinct.  The  objective  is  to  minimize  the  expected  cost  over  all  scenarios: 

zN*  :=  min  (  cv  4 

\ij&E:i£S,j£T  k= 1  ijeE:ieSkUS,jeT<*UT,i<£S\/j<£T 

We  refer  to  this  definition  of  the  two-stage  stochastic  minimum  s  —  t  cut  problem  as  its  node-based 
version ;  the  original  definition  provided  in  Section  4.1  is  further  referred  to  as  the  arc-based  version. 

Consider  an  example  given  by  Figure  1  and  discussed  in  Section  4.2.  An  optimal  solution  using 
this  node-based  interpretation  is  shown  in  Figure  6.  In  the  first  stage,  neither  of  the  two  nodes  2 
and  3  is  assigned.  In  the  second  stage,  both  nodes  2  and  3  are  assigned  to  set  T,  corresponding  to 
the  terminal  node  4.  Hence,  arcs  (1,2)  and  (1,3)  are  cut  in  both  scenarios,  leading  to  a  total  cost  of 
11.  Recognize  that  this  solution  is  not  unique. 

Comparing  the  two  optimal  solutions  from  the  arc-based  version  and  the  node-based  version, 
we  recognize  that  the  difference  in  both  interpretations  is  the  role  of  the  sets  S  and  T.  In  the 
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o - *o 

first  stage  cut 


O . >o 

second  stage  cut:  scenario  1 

O - X) 

second  stage  cut:  scenario  2 


Figure  6:  In  the  first  stage,  no  node  is  assigned.  In  the  second  stage,  nodes  2  and  3  are  assigned  to 
set  T.  The  second  stage  decision  is  the  same  for  both  scenarios.  The  cost  of  this  minimum  cut  is 
11. 

arc-based  version,  nodes  2  and  3  are  not  assigned  to  any  of  those  sets,  while  the  arcs  (2,3)  and  (3,2) 
are  both  cut  in  the  first  stage.  The  assignments  of  the  nodes  are  performed  in  the  second  stage, 
dependent  on  the  scenario.  The  resulting  solution  cannot  be  obtained  via  the  node-based  version, 
as  assignments  of  the  nodes  to  one  of  the  sets  performed  in  the  first  stage  are  final.  However,  this  is 
not  possible  in  the  node-based  version  of  the  problem.  Generally  speaking,  in  the  node-based  case 
the  assignments  of  the  nodes  can  either  be  done  in  the  first  stage  or  in  the  second  stage,  depending 
on  the  scenario  that  occurred.  However,  all  the  assignments  performed  in  the  first-stage  are  final 
and  cannot  be  changed  in  the  second  stage.  The  cut  is  then  the  result  of  the  assignments  of  the 
nodes  in  both  stages. 

Inspecting  the  two  definitions  of  the  two-stage  minimum  s  —  t  cut  problem,  one  observes  that  any 
solution  of  the  node-based  version  is  also  feasible  for  the  arc-based  version  with  the  same  objective 
function  value.  Such  a  solution  can  be  obtained  as  follows.  Let  5,  T,  Sk,  Tk  be  a  partition  of  V 
defining  a  valid  cut  for  the  node-based  version.  Then,  define  the  following  cut  for  the  arc-based 
version: 


Eq  : ={ij  |  ij  £  E  and  i  £  S  and  j  £  T}  (149) 

Ek  :={ij  |  ij  £  E  and  i  £  S  U  Sk  and  j  £  Tk}  Mk.  (150) 

This  is  summarized  in  the  next  proposition. 

Proposition  3  zA*  <  zN* . 

Furthermore,  one  can  observe  that  the  arc-based  and  node-based  versions  are  equivalent  when 
G  is  a  tree.  The  reason  is  that  for  trees  selecting  an  arc  ij  £  E  in  the  first  stage  is  equivalent  to 
assigning  node  i  to  set  S  and  node  j  to  set  T . 

Corollary  3  The  arc-based,  and  node-based  versions  of  the  two-stage  stochastic  minimum  s  —  t  cut 
problem  are  equivalent  and  solvable  in  linear  time  if  graph  G  is  a  tree. 

One  may  wonder  whether  this  difference  of  the  arc-based  and  the  node-based  version  for  directed 
graphs  holds  true  as  well  for  the  corresponding  undirected  version  of  the  problems.  An  answer  is 
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(b)  Optimal  solution  for  arc-based  version  (c)  Optimal  solution  for  node-based  version 
with  cost  2.  with  cost  11. 


Figure  7:  Difference  of  node-based  and  arc-based  versions  for  undirected  graphs.  The  legend  is  the 
same  as  in  Figures  2  and  6,  but  rather  for  undirected  arcs. 


provided  in  Figures  7,  which  demonstrates  the  difference  of  arc-based  and  node-based  version  of  the 
two-stage  stochastic  minimum  cut  problem  for  undirected  graphs. 

We  define  the  following: 


X; 


1,  if  i  G  S 
0,  otherwise 


as  well  as 


and 


f  1,  if  i  €  T 

\  0,  otherwise 


f  1,  if  i  €  Sk 
\  0,  otherwise 


and 


(  1,  if  i£Tk 
\  0,  otherwise 


Consider  the  following  bi-linear,  linearly  constrained  0-1  programming  formulation: 


(151) 


(152) 


min  (  CijxfxJ  +  Y'Pkdij(xfzJk  +  zfkxj  +  z?kzjk)  )  (153) 

ij&E  V  fc=l  / 

s.t.  xf  +  xf  +  zfk  +  zjk  =  1  Vi  €  V,  k  =  1, . . . ,  K ;  (154) 

+  zfk  =  1  xt  +  ztk  =  1  Vfe  =  1, . . . ,  K\  (155) 

xf,x[,zfk,zfk€{  0,1}  VieV,  \/k=l,...,K.  (156) 


Constraints  (154)  ensure  that  each  node  i  G  V  is  assigned  to  exactly  one  of  the  sets  S,  T,  Sk  or 
Tk  for  each  scenario  k,  while  constraints  (155)  make  sure  that  s  G  S  U  Sk  and  t  G  T  U  Tk  for  each 
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scenario  k.  The  cost  of  the  corresponding  cut  is  evaluated  in  the  objective  (153)  as  follows:  the  first 
terms  sum  all  the  costs  of  all  cut  arcs  ij  G  E  with  i  G  S  and  j  G  T,  while  the  second  term  sums  all 
costs  of  the  cut  arcs  ij  G  E  with  i  G  Sk  U  S  and  j  G  Tk .  This  proves  the  following  result. 

Proposition  4  Formulation  (153)  -  (156)  models  the  node-based  version  of  the  two-stage  minimum 
s  —  t  cut  problem  correctly. 

By  eliminating  variables  zfk  using  the  relations  in  equalities  (154)  and  (155),  one  obtains  the 
following  continuous,  box-constrained,  bi-linear  optimization  problem: 

min  (cvxixJ  +  +  4  ~  xixJ  -  X^XJ  ~~  44  -  xizJk  ~  44)1  (157) 

ij£E  V  fc=l  / 

s.t.  4  =  4  =  0  =  4  =  0  Vfc  =  1, . . . ,  K  (158) 

,  4  4  €  [o,  1]  Vi  G  V,  \/k  =  1, ... ,  K.  (159) 

Proposition  5  Formulations  (153)  -  (156)  and  (157)  -  (159)  are  equivalent. 

Proof: We  need  to  show  that  any  optimal  solution  of  (157)  -  (159)  is  binary.  This  follows  from  the 
fact  that  there  are  no  quadratic  terms  but  only  bi-linear  expressions  in  the  objective.  □ 

4.6  Concluding  Remarks 

Based  on  two  equivalent  formulations  of  the  classical  minimum  s  —  t  cut  problem,  we  introduce 
two  different  versions  (arc-based  and  node-based)  of  the  two-stage  stochastic  minimum  s  —  t  cut 
problem.  These  versions  are  equivalent  if  the  considered  graph  is  a  tree;  however,  in  the  general  case 
they  lead  to  different  solutions.  We  provide  a  mathematical  programming  formulation  for  the  arc- 
based  version  that  is  motivated  by  a  standard  linear  0-1  programming  model  for  the  deterministic 
minimum  s—t  cut  problem.  We  prove  that  the  constraint  matrix  of  the  new  formulation  loses  its  total 
unimodularity  property,  in  general;  however,  the  matrix  preserves  the  property  if  the  considered 
graph  is  a  tree.  This  fact  turns  out  to  be  not  surprising  as  we  show  that  similar  to  many  other 
stochastic  extensions  of  classical  combinatorial  optimization  problems,  ( e.g .,  minimum  spanning 
tree  [61])  the  arc-based  version  of  the  two-stage  stochastic  minimum  s  —  t  cut  problem  is  ./VP-hard. 
In  the  case  of  trees,  the  two-stage  stochastic  minimum  s  —  t  cut  problem  is  polynomially  solvable 
due  to  the  total  unimodularity  property;  we  also  describe  a  simple  linear  time  solution  algorithm. 
The  computational  complexity  of  the  node-based  version  has  yet  to  be  fully  explored  ( e.g .,  its 
IVP-hardness  remains  open). 
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5  Bilevel  Knapsack  Problems  with  Stochastic  Right-Hand  Sides 

The  details  of  the  work  in  this  chapter  can  be  found  in: 

•  O.Y.  Ozaltin,  O.A.  Prokopyev,  A.J.  Schaefer,  “The  Bilevel  Knapsack  Problem  with  Stochastic 
Right-Hand  Sides,”  Operations  Research  Letters ,  Vol.  38/4  (2010),  pp.  328-333. 

Bilevel  programs  [41,  45,  106]  model  the  hierarchical  relationship  between  two  autonomous,  and 
possibly  conflicting,  decision  makers:  the  leader  and  the  follower.  This  hierarchical  relationship 
results  from  the  fact  that  the  follower’s  problem  is  affected  by  the  decision  of  the  leader.  Moreover, 
the  follower’s  decision  in  return  affects  the  leader’s  problem. 

The  bilevel  knapsack  problem  was  first  considered  by  Dernpe  and  Richter  [46] .  In  this  problem, 
the  follower  solves  a  0-1  knapsack  problem  subject  to  the  capacity  set  by  the  leader.  The  leader 
earns  a  profit  from  the  items  selected  by  the  follower,  and  both  decision  makers  seek  to  maximize 
their  own  profits.  Dernpe  and  Richter  [46]  formulated  this  problem  as  a  mixed-integer  bilevel 
program,  and  proposed  a  branch-and-bound  algorithm.  Recently,  Brotcorne  et  al.  [27]  considered 
the  same  problem,  and  developed  a  dynamic  programming  algorithm  that  outperformed  Dernpe 
and  Richter’s  [46]  branch-and-bound  algorithm. 

In  our  work  we  introduce  the  bilevel  knapsack  problem  with  stochastic  right-hand  sides  (BKPS). 
BKPS  is  a  stochastic  extension  of  the  bilevel  knapsack  problem  where  the  leader’s  decision  has  an 
uncertain  effect  on  the  follower’s  knapsack  capacity.  We  model  this  uncertainty  using  a  finite  set  of 
scenarios.  Brotcorne  et  al.  [27]  identified  an  application  of  the  bilevel  knapsack  problem  in  revenue 
management,  where  a  company  (i.e. ,  the  leader)  determines  the  number  of  units  to  sell  by  itself, 
and  handing  the  remainder  over  to  an  intermediary  (i.e.,  the  follower).  In  this  context,  BKPS  arises 
when  there  is  uncertainty  in  the  number  of  units  transferred  to  the  intermediary.  For  example,  in 
the  distribution  of  perishable  goods  [72],  some  items  may  be  spoiled  during  the  shipment  process. 

Consider  a  set  of  n  items  where  each  item  j  G  {1, . . .  ,  n}  has  an  associated  weight  aj  G  Z+,  and 
two  revenues:  the  follower’s  revenue  Cj  G  M+,  and  the  leader’s  revenue  dj  G  Ml  .  The  follower  must 
solve  a  knapsack  problem  to  maximize  her  own  objective  subject  to  a  capacity  /i(cj,  y)  that  depends 
on  the  leader’s  choice  of  y  as  well  as  a  discretely  distributed  random  variable  w  G  11.  This  yields 
the  following  stochastic  bilevel  program: 

[BKPS]  maximize  /(y,  X)  =  ty  +  Ew  [d7  x(w,  y)]  (160a) 

subject  to  b<y<b,  y  G  M1,  (160b) 

x(c v,y)  £  R(h(u,y))  V  u  G  fl,  (160c) 

where  R(h(uj,y))  =  argrnax  \  cTx  :  aTx  <  h(cu,y),  x  G  {0,  1}”}  ,  i.e.,  the  follower’s  rational  reac¬ 

tion  set.  For  y  G  [ b ,  b] ,  X  is  an  1 11 1  x  n  binary  matrix  whose  rows  represent  the  subset  of  items  selected 
by  the  follower  under  the  scenario  w  G  O,  i.e.,  x(w,y).  We  assume  that  h(u,y)  :  D  x  Z1  — >  is  a 
nondecreasing  function  of  y,  and  that  h(oj,  b)  is  finite  for  all  cj  G  fl. 

In  our  paper  cited  above  we  provide  necessary  and  sufficient  conditions  for  the  existence  of  an 
optimal  solution.  When  the  leader’s  decisions  can  take  only  integer  values,  we  present  an  equivalent 
two-stage  stochastic  programming  reformulation  with  binary  recourse.  We  develop  a  branch-and- 
cut  algorithm  for  solving  this  reformulation,  and  a  branch-and-backtrack  algorithm  for  solving 
the  scenario  subproblems.  Computational  experiments  indicate  that  our  approach  can  solve  large 
instances  in  a  reasonable  amount  of  time. 
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6  Two-Stage  Stochastic  Assignment  Problems 

This  chapter  is  mostly  based  on  the  results  from: 

•  S.  Karademir,  O.A.  Prokopyev,  N.  Kong,  “On  Greedy  Approximation  Algorithms  for  a  Class 
of  Two-Stage  Stochastic  Assignment  Problems,”  Technical  report,  2011 

6.1  Introduction 

Given  a  set  V  of  n  agents,  a  set  JJ  of  n  jobs  and  a  weight  (or  cost)  Wij  for  each  i  £  V  and  j  £  U,  the 
well-known  linear  assignment  problem  consists  of  assigning  each  agent  to  exactly  one  job  in  such 
a  manner  that  each  job  is  performed  by  exactly  one  of  the  agents  and  the  total  weight  (cost)  of 
the  obtained  assignment  is  maximized  (minimized).  In  this  chapter  we  consider  the  maximization 
version  and  the  mathematical  program  for  the  related  linear  assignment  problem  can  be  given  as 
follows  [113]: 


max  EE  UHjXij  (161) 

x  i&v  jeu 

s.t.  ^2xij  =  b  for  aU  i  £  V,  (162) 

i&u 

=  1,  for  all  j  £  U,  (163) 

iev 

Xij  £{0, 1},  for  all  i  £  V,  j  £  U.  (164) 


Linear  assignment  problem  (161)-(164)  is  also  known  as  the  weighted  bipartite  matching  prob¬ 
lem  [113].  Namely,  given  a  weighted  bipartite  graph  G(V  U  U,E )  with  |K|  =  \U\  and  arc  weights 
for  all  (*,i)  £  E  we  need  to  find  a  perfect  matching  of  maximum  weight.  Recall  that  perfect 
matching  is  a  matching  which  matches  all  vertices  of  the  graph. 

It  is  well  known  that  the  constraint  matrix  for  (162)-(163)  is  totally  unimodular  [113].  Therefore, 
we  can  safely  remove  integrality  constraints  (164)  and  solve  the  linear  programming  relaxation 
(161)-(163)  to  get  the  optimal  solution.  However,  the  most  popular  approach  to  tackle  the  linear 
assignment  problem  is  the  Hungarian  Method  [91],  which  can  be  considered  as  an  implementation 
of  the  primal-dual  method  for  the  respective  minimum  cost  flow  problem  [113].  The  Hungarian 
Method  (HM)  works  with  the  dual  of  the  linear  program  (161)-(163)  given  by 


min 

a,/3 


i=  1  3= 1 


(165) 


s.t.  cti  +  (3j>Wij,  i  =  l,...,n,  j  =  l,...,n.  (166) 

In  this  chapter  we  are  concerned  with  the  following  two-stage  stochastic  programming  extension 
of  (161)-(164),  which  is  further  referred  to  as  the  two-stage  stochastic  linear  assignment  (2SSLA) 
problem.  Each  edge  (i,  j),  i  £  V  and  j  £  U,  is  associated  with  the  first-stage  weight  w^j,  and  the 
second-stage  weight  for  scenario  k,  k  =  1, . . . ,  K.  The  first-stage  decision  x  is  to  choose  some 
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matching  in  G  that  is  not  necessarily  perfect.  At  the  second  stage  a  scenario  k  is  realized  with 
probability  p For  each  scenario  k,  the  second-stage  decision  yk  is  to  choose  a  matching  over  those 
agents  and  jobs  that  are  unmatched  in  the  first  stage  in  order  to  form  a  perfect  matching.  The 
overall  goal  is  to  find  a  perfect  matching  with  the  maximum  expected  weight.  Then  the  two-stage 
stochastic  programming  extension  of  (161)- (164)  can  be  written  as  follows: 

n  n  K  n  n 

™  X  X]  ■  xv + X pk  •  X  X  4  •  4  (167) 

’  i=  1  j= 1  k=  1  i=  1  j=l 

n 

s.t.  ^  +  4)  =  1’  i  =  l,...,n,  k  =  1, . . . ,  Ji ,  (168) 

j= 1 

n 

X(xb'  +  4)=1’  j  =  l,...,n,  =  (169) 

2=1 

e  {0, 1},  4'  6  {O’  1}’  bJ  =  1,  •••,«,  fc  =  l,...,A.  (170) 

A  number  of  studies  demonstrate  the  advantage  of  stochastic  programming  models  over  de¬ 
terministic  approaches  [23].  Recent  examples  of  these  types  of  studies  in  the  literature  include 
two-stage  stochastic  extensions  of  the  shortest  path  [70],  minimum  spanning  tree  [49,  61],  min- 
cut  [48,  70],  and  Steiner  tree  [75]  problems.  For  a  detailed  introduction  to  stochastic  programming, 
we  refer  the  reader  to  [23,  133].  Next  we  briefly  describe  two  papers  that  are  most  closely  related 
to  our  work  in  this  chapter. 

Kong  and  Schaefer  [88]  consider  the  two-stage  stochastic  maximum  weight  matching  problem 
on  general  graphs.  They  show  that  the  problem  is  AP-hard  and  propose  a  greedy  ^-approximation 
algorithm.  Escoffier  et  al.  [52]  prove  that  the  two-stage  stochastic  maximum  weight  matching 
problem  is  APA-complete  even  for  bipartite  graphs  of  maximum  degree  4  and  general  graphs 
of  degree  3,  which  implies  that  there  is  no  polynomial-time  approximation  scheme  (PTAS)  for 
this  problem  as  long  as  P  ^  AP.  Based  on  the  concepts  from  [88],  they  also  provide  a  greedy 
m ax {  2k_\  ,  2X  l  l-approximation  algorithm,  where  K  is  the  number  of  scenarios  in  the  second- 
stage  and  A  is  the  degree  of  the  bipartite  graph. 

Our  work  is  essentially  built  on  these  two  studies  [52,  88].  First,  we  consider  the  greedy  ap¬ 
proximation  methods  from  these  papers  for  the  two-stage  stochastic  linear  assignment  problem. 
Since  the  maximum  weight  matching  problem  on  bipartite  graphs  can  be  easily  reduced  to  the 
linear  assignment  problem  via  addition  of  dummy  agents  and/or  jobs,  the  2SSLA  problem  is  also 
APA-complete.  We  propose  a  necessary  optimality  condition  that  generalizes  and  unifies  the  key 
ideas  behind  the  two  algorithms  by  Kong  and  Schaefer  [88]  and  Escoffier  et  al.  [52].  Then  based 
on  this  optimality  condition,  we  design  a  new  greedy  approximation  algorithm  referred  to  as  EGA. 
While  the  developed  approach  preserves  the  existing  approximation  guarantees,  we  are  not  able  to 
prove  whether  EGA  provides  a  better  approximation  bound.  However,  analytical  observations  and 
extensive  computational  results  indicate  that  EGA  has  strictly  better  results  on  some  rather  broad 
classes  of  the  two-stage  stochastic  linear  assignment  problem. 

6.2  Greedy  Approximation  Algorithms 
6.2.1  Basic  Greedy  Approach 

Since  the  2SSLA  problem  is  AP-hard,  we  can  not  expect  to  solve  it  exactly  for  large  input  sizes. 
Hence,  we  seek  for  an  approximation  algorithm  that  will  have  a  reasonable  performance  guarantee. 
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We  first  discuss  the  Greedy  Algorithm  (GA)  for  more  general  two-stage  stochastic  matching  prob¬ 
lem  given  in  [88].  Since  linear  assignment  is  a  specific  case  of  maximum  weight  matching  problem, 
this  algorithm  can  be  simply  adopted  to  2SSLA  with  the  same  performance  guarantee.  For  further 
discussion  we  need  the  following  notation: 

Definition  1  A  first-stage  myopic  solution  is  an  optimal  solution  to: 


{n  n 

i= 1  3= 1 


wax 


Vj  ^2xij  =  1,  Vi  ^2xij  =  1;  Vi,  j  xij  €  {0, 1} 

3= 1 


(171) 


i=l 


Definition  2  A  second-stage  myopic  solution  for  scenario  k  is  an  optimal  solution  to: 


(GA  -  II)  :  max  EE 


QijVij 


i=  1  3= 1 


Vj  =  1,  Vi  ^  Vi]  =  1;  Vi,  j  yj  €  {0, 1} 

i=i 


(172) 


i=l 


First-  and  second-stage  myopic  solutions  are  the  solutions  corresponding  to  deterministic  linear 
assignment  problems  with  the  appropriate  choices  of  weights  in  the  objective  functions.  Let  xGA 
and  ZGA  be  the  first-stage  myopic  solution  and  the  respective  optimal  objective  function  value. 
Similarly,  let  y GA  and  ZGA  be  the  second-stage  myopic  solution  and  the  respective  optimal  objective 
function  value  for  scenario  k.  Finally,  denote  by  ZGA  the  expected  value  of  the  second-stage  myopic 
solutions,  i.e., 

^2GA  =  X>Z2GA 

fc=l 

GA  works  as  follows.  Initially,  it  finds  the  first-stage  myopic  solution  (GA-I)  as  well  as  the  second- 
stage  myopic  solutions  for  each  scenario  (GA-II).  Then  it  compares  the  objective  function  value  of 
the  first-stage  myopic  solution  with  the  expected  objective  function  value  of  the  second-stage  myopic 
solutions.  The  final  assignment  weight  ZGA  corresponds  to  the  better  of  them  and  is  given  by 

ZGA  =  max  {  ZfA ,  Z$A  }  .  (173) 

The  final  agent-job  assignments  are  given  either  by  (xc"4,  0, . . .  ,  0)  or  by  (0,  yGA , . . . ,  y%A),  respec¬ 
tively.  That  is  all  assignments  are  made  completely  either  at  the  first  stage,  or  at  the  second  stage 
for  each  scenario. 

Theorem  3  Greedy  Algorithm  is  an  approximation  algorithm  with  the  performance  guarantee  ^ 
for  2SSLA  problem. 

Proof:  Same  as  the  proof  in  [88]  for  the  two-stage  stochastic  matching  problem.  □ 

Since  GA  solves  each  stage  and  each  scenario  separately,  it  actually  solves  the  deterministic  linear 
assignment  problem  K  +  1  times.  Thus,  one  can  utilize  the  Hungarian  Method(HM)  [91]  to  solve 
each  assignment  problem.  Consequently,  given  the  complexity  of  HM,  the  outlined  algorithm  obtains 
^-approximate  solution  of  (167)-(170)  in  0(Kn 3)  arithmetic  operations. 


6.2.2  Greedy  Approach  of  EscofRer  et  al. 

In  this  section  we  describe  a  slightly  more  advanced  greedy  approach  proposed  by  EscofRer  et  al. 
in  [52],  We  will  refer  to  this  algorithm  as  GAE.  The  basic  idea  is  that  if  Ylk=iPkQij  >  wij  f°r  some 
i  and  j,  then  it  can  not  be  optimal  to  assign  agent  i  to  job  j  in  the  first  stage.  Tins  result  follows 
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from  the  observation  that  any  solution  to  the  2SSLA  problem  that  assigns  i  to  j  in  the  first  stage, 
could  be  improved  by  assigning  i  to  j  in  the  second  stage  across  all  scenarios.  In  fact,  as  we  show 
in  Section  6.3,  this  result  is  a  special  case  of  a  more  general  necessary  optimality  condition. 

GAE  works  as  follows.  Initially,  it  replaces  all  first-stage  weights  Wij  with 

f  K 

=  max  <  w^,  ^ ^PkQij 
{  k= 1 


and  obtains  the  first-stage  myopic  solution  with  the  updated  weights.  Then,  all  agent-job  assign¬ 
ments  i.e.,  j  =  mate[i\  and  i  =  mate\j ],  such  that  Wij  =  Yl^=iPkQij  are  moved  to  the  second 

stage.  Subsequently,  GAE  solves  K  assignment  problems  for  all  agents  and  jobs  moved  to  the  sec¬ 
ond  stage  across  all  scenarios.  Denote  the  resulting  solution  by  ZGAE  and  the  above  described 
algorithmic  procedure  by  GAE- 1. 

Next  GAE  compares  ZGAE  with  the  expected  objective  function  value  of  the  second-stage  myopic 
solutions  ZGA.  The  final  assignment  weight  ZGAE  corresponds  to  the  better  of  them  and  is  given 
by 


ZGAE  =  max  {ZGAE,  Z%A)  . 


Theorem  4  ([52])  GAE  is  an  approximation  algorithm  with  the  performance  guarantee  2k-i  for 
2SSLA. 


Theorem  5  ([52])  GAE  is  an  approximation  algorithm  with  the  performance  guarantee  9Z^_1  for 
2SSLA ,  where  A  is  the  degree  of  the  bipartite  graph. 


Both  approximation  bounds  listed  above  are  slightly  better  than  ^  approximation  bound  of  GA. 
Furthermore,  it  is  easy  to  observe  that  running  time  of  GAE  is  given  by  0(Kn3). 


6.3  Necessary  Optimality  Condition 

In  this  section  we  describe  a  necessary  optimality  condition  for  the  2SSLA  problem.  Let  A  C  V 
be  a  subset  of  agents  and  J  C  U  be  a  subset  of  jobs  such  that  |A|  =  |  J\.  Since  cardinality  of  sets 
A  and  J  are  the  same,  we  can  consider  a  two-stage  stochastic  linear  assignment  problem  on  these 
subsets  of  agents  and  jobs.  This  assignment  will  be  a  perfect  one  as  all  agents  and  jobs  can  be 
matched.  Let  W\  [A,  J ]  be  the  first-stage  myopic  solution  and  IL^A,  J]  be  the  expected  value  of  the 
second-stage  myopic  solutions  over  all  scenarios.  Next  we  can  state  the  following  result: 

Proposition  4  (Necessary  Optimality  Condition)  Let  A  C  V ,  J  C  U  and  |A|  =  |J|.  If 

VFi[A,  J]  <  W^[A,J\  (W\[A,J\  >  W2[A,J]),  then  no  optimal  solution  of  2SSLA  can  contain  a 
perfect  assignment  between  agents  in  A  and  jobs  in  J  in  the  first  stage  (second  stage). 

Proof:  Consider  an  optimal  solution  of  an  instance  of  the  2SSLA  problem.  If  ILj  [A,  J]  <  W-i\A,  J] 
(ILj  [A.  J ]  >  W‘i[A,  J])  and  the  optimal  solution  contains  a  perfect  assignment  between  A  and  J  in 
the  first  stage  (second  stage),  then  moving  assignments  between  A  and  J  to  the  second  stage  (first 
stage)  would  increase  the  weight  of  the  optimal  solution,  which  contradicts  our  assumption  that  the 
solution  is  optimal.  □ 

Unfortunately,  this  necessary  optimality  condition  is  not  sufficient  for  a  solution  of  2SSLA  to 
be  optimal  even  if  we  check  it  for  all  0(2n)  possible  different  subsets  of  V  and  U.  Consider  the 
following  simple  instance  of  the  2SSLA  problem  given  in  Figure  8. 
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Figure  8:  A  counterexample  to  show  that  the  necessary  optimality  condition  given  by  Proposition  4 
is  not  sufficient  for  optimality.  Only  arcs  with  nonzero  weight  are  shown. 


In  the  first  stage,  agents  a\  and  02  are  assigned  to  jobs  j'2  and  j\ ,  respectively.  In  the  second 
stage,  agent  03  is  assigned  to  job  j'3  under  both  scenarios.  The  total  weight  of  the  assignment  is  4 
units.  Notice  that  this  solution  does  not  violate  the  necessary  optimality  condition  for  any  subset 
of  agents  and  jobs.  However,  optimal  assignment  has  a  total  weight  of  5  units  which  is  achieved 
by  assigning  a\  to  j\ ,  02  to  j'3,  and  03  to  j'2  in  the  first  stage.  Therefore,  the  necessary  optimality 
condition  given  by  Proposition  4  is  not  sufficient  to  guarantee  optimality. 

Nevertheless,  Proposition  4  can  be  actually  utilized  to  construct  approximation  algorithms.  In 
fact,  algorithms  GA  (Section  6.2.1)  and  GAE  (Section  6.2.2)  are  based  on  the  necessary  optimality 
condition  for  some  specific  subsets  of  agents  and  jobs.  Observe  that  GA  is  the  implementation  of 
Proposition  4  when  |A|  =  |J|  =  n.  In  other  words,  GA  verifies  the  necessary  optimality  condition 
only  for  A  =  V  and  J  =  U.  If  W\  [V,  U)  (i.e. ,  the  weight  of  the  first-stage  myopic  solution  given  by 
GA-I)  is  greater  than  W2IV.  U]  (i.e.,  the  expected  weight  of  the  second-stage  myopic  solutions  over 
all  scenarios  given  by  GA-II),  then  it  moves  assignments  between  V  and  U  to  the  second  stage. 
Otherwise  all  assignments  are  made  in  the  first  stage. 

Similarly,  GAE  is  the  implementation  of  the  necessary  optimality  condition  for  all  sets  A  and  J 
such  that  |A|  =  |J|  =  1  and  |A|  =  \J\  =  n.  As  we  have  stated  in  Section  6.2.2,  GAE-I  moves  an 
assignment  (i,j)  to  the  second  stage  if  it  has  a  better  expected  weight  in  the  second  stage.  Thus, 
A  =  {i},  J  =  {j},  W 1  [A,  J]  =  Wij ,  and  TT^A,  ,7]  =  Ylk=iPkQij-  Next,  GAE  compares  ZEAE  with 
ZEA  =  W2[V,U]  and  outputs  the  better  solution,  which  is  equivalent  to  checking  the  necessary 
optimality  condition  for  |A|  =  \J\  =  n.  The  only  difference  here  is  that  instead  of  Wi[V,  U ],  we  use 
the  solution  of  GAE-I  and  compare  it  to  W 2 [V.  U ]. 

6.4  Enhanced  Greedy  Approach 

In  this  section  we  propose  a  more  generic  approximation  algorithm,  further  referred  to  as  Enhanced 
Greedy  Algorithm  (EGA),  that  attempts  to  utilize  the  necessary  optimality  condition  described 
by  Proposition  4  in  a  more  sophisticated  manner.  EGA  is  based  on  the  Hungarian  Method  as  a 
standard  routine  to  solve  all  the  deterministic  linear  assignment  subproblems.  Recall  that  HM  works 
with  the  dual  problem  (165)-(166).  Furthermore,  EGA  utilizes  the  dual  problem  of  the  LP  relaxation 
of  (167)-(169)  given  by: 
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(174) 


n  K  n  K 


i= 1  k= 1  j= 1  fc=l 


n 


s.t. 

^  ^  ik  fijk)  —  Wij) 

k= 1 

,,n,  j  =  1, ...  ,n, 

(175) 

&ik  fijk  —  PkQiji 

*  =  !,-• 

■ ,  n,  j  =  1, . . . ,  n,  k  =  l,.. 

,,K. 

(176) 

In  the  remainder  of  this  chapter  we  will  refer  to  (175)  and  (176)  as  the  “first-stage”  and  “second- 
stage”  dual  constraints,  respectively,  as  they  correspond  to  the  assignment  weights  at  the  first  and 
second  stage.  Furthermore,  for  convenience  of  notation,  we  let  =  PkQij- 

EGA  has  two  major  steps.  The  first  step  (further  referred  to  as  EGA-I)  is  to  start  with  first- 
stage  myopic  solution  and  then  attempt  to  improve  the  objective  value  by  “moving”  some  of  the 
assignments  to  the  second  stage  via  checking  the  necessary  optimality  condition.  The  second  step 
of  the  EGA  (further  referred  to  as  EGA-II)  is  to  start  with  the  second-stage  myopic  solutions  and 
then  attempt  to  improve  the  objective  value  by  “moving”  some  of  the  assignments  to  the  first  stage. 
Then  similar  to  GA  and  GAE,  EGA  chooses  the  solution  with  the  better  objective  value  and  outputs 
it  as  the  final  solution. 

We  want  to  note  that  there  is  a  strong  relationship  between  the  necessary  optimality  condition 
given  by  Proposition  4  and  the  feasibility  of  the  dual  program  (175)-(176).  The  first-stage  myopic 
solution  is  feasible  to  the  first-stage  dual  constraints  (175)  and  the  second-stage  myopic  solutions  are 
feasible  to  the  second-stage  dual  constraints  (176).  However,  the  myopic  solutions  are  not  necessarily 
feasible  to  both  (175)  and  (176)  simultaneously.  EGA-I  starts  with  the  first-stage  myopic  solution 
feasible  to  the  first-stage  dual  constraints,  and  uses  the  necessary  optimality  condition  to  achieve 
feasibility  of  the  second-stage  dual  constraints.  Specifically,  it  will  be  shown  that  the  necessary 
optimality  condition  for  specific  pairs  of  subsets  of  agents  and  jobs  corresponds  to  a  second-stage 
aggregated  dual  constraint,  which  is  obtained  by  aggregating  the  respective  subset  of  the  second- 
stage  dual  constraints.  Similarly,  EGA-II  starts  with  the  second-stage  myopic  solutions  feasible  to 
the  second-stage  dual  constraints,  and  uses  the  necessary  optimality  condition  to  achieve  feasibility 
of  a  first-stage  aggregated  dual  constraint,  which  is  obtained  by  aggregating  a  subset  of  the  first- 
stage  dual  constraints. 

6.4.1  Improving  the  first-stage  assignment  (EGA-I) 

EGA-I  starts  with  all  agent-job  assignments  made  in  the  first  stage  (i.e.,  the  first-stage  myopic 
solution)  and  then  attempts  “moving”  some  of  them  to  the  second  stage  if  it  is  worth  doing  so. 
Next  we  briefly  describe  the  key  ideas  behind  EGA-I.  The  pseudo-code  of  the  approach  is  given  by 
Algorithm  1. 

The  initial  step  of  our  algorithm  is  essentially  GAE  described  in  Section  6.2.2.  EGA-I  applies  the 
necessary  optimality  condition  given  by  Proposition  4  and  starts  with  the  sets  of  unit  cardinality,  i.e., 
\A\  =  |  J\  =  1.  For  every  agent-job  pair  (i,  j),  A  =  {?'}  and  J  =  {j } ,  we  know  that  IFi[{i},  {j }]  =  Wij 
and  we  calculate  {.(}]  =  Then,  we  make  the  following  weight  update  in  the  first 

stage: 

Wij  =  max {Wi[{i},  {j}],  W2[{i},  {j}]}  =  max  jt%,  ^  q-X  .  (177) 
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Algorithm  1:  EGA- 1 

Input:  n  agents,  n  jobs,  K  scenarios,  Wij,  q^,  pk- 

1  Run  GAE-I 

2  Reset  all  the  first-stage  weights  to  their  original  values  and  remove  from  consideration  all 
agents  and  jobs  moved  to  the  second  stage  by  GAE-I 

3  Let  =  PkQij',  define  Go  and  G^  to  be  the  graphs  for  the  first  stage  and  the  kth  scenario  in 
the  second  stage,  respectively 

4  Run  Hungarian  Method  on  G&  for  all  k 

5  Let  G  include  closed  subsets  of  agents  and  jobs  in  the  obtained  second-stage  solution 

6  foreach  {A,  J}  £  G  do 

r  Let  A  =  W2[A,  J ]  -  (£ieA  a*  +  £iej  f3j),  where  a,  ft  is  the  dual  solution  for  Go- 

8  if  A  >  0  then 

9  Let  G  be  the  graph  containing  only  A  and  J 

10  Run  Hungarian  Method  on  G 

11  Let  E  be  set  of  edges  selected  in  the  resulting  assignment 

12  Let  E  =  {ei,  e2, . . .  e\A\}  be  sorted  in  non-decreasing  order  of  edge  weight 

13  Set  we.A |+1  =  oo  and  let  w  £  R+  and  t  £  Z+  satisfy  the  following  condition: 

14  (1)  t  ■  W  -  J2r=\  Wer  =  A 

15  (2)  wet  <w<  wet+1 _ 

16  Set  wer  =  w  for  r  =  1,  t  in  Go 

17  else 

is  \_C^C\{A,J} 

19  Run  Hungarian  Method  on  Go- 

20  begin  Reset  Go 

21  while  there  exists  { A ,  J}  £  C  not  closed  in  Go  do 

22  foreach  Edge  weight  w^j  modified  in  {A,J}  do 

23  Reset  w 

24  Perform  one  iteration  of  Hungarian  Method 

25  [g^C\{A,J} 

26  Move  all  closed  subsets  in  G  to  the  second  stage 

27  Redefine  G&  to  be  the  subgraph  of  agents  and  jobs  moved  to  the  second-stage  for  scenario  k. 

28  Run  Hungarian  Method  on  G&  for  all  k. 

29  ^1  T  =  SieGo  wi,mate[i]  +  Ylk  ^2ieGk  %,matek[i] 

30  return  ZfGA,  Go,  and  Gk  V  k 


Then,  HM  is  applied  on  the  resulting  graph,  all  assignments  with  a  modified  weight  are  moved  to 
the  second  stage,  and  all  edge  weights  are  reset  to  their  original  values  (lines  1  and  2  of  Algorithm 
1).  We  want  to  emphasize  that  after  this  point,  EGA-I  works  only  with  agents  and  jobs  that  are 
not  moved  to  the  second  stage  in  line  1. 

Next  EGA-I  checks  the  optimality  condition  for  sets  with  cardinality  greater  than  1,  i.e. ,  |A|  = 
|  J|  >  1.  There  are  0(2™ )  possible  ways  that  the  subsets  A  and  J  can  be  selected.  Furthermore,  in 
order  to  calculate  W\  [A.  J]  and  W2IA,  J],  one  needs  to  solve  K  +  1  deterministic  linear  assignment 
problems  for  every  subset,  where  K  is  the  number  of  scenarios.  Thus,  it  is  computationally  pro¬ 
hibitive  to  check  all  subsets  of  agents  and  jobs.  Instead,  EGA-I  considers  only  a  few  subsets  that 
are  promising  and  easy  to  check. 

Definition  3  (Closed  Subset)  A  closed  subset  in  the  first  stage  is  a  pair  of  subsets  A  of  agents 
and  J  of  jobs  such  that  all  agent-job  assignments  remain  within  these  two  sets  in  the  first-stage 
myopic  solution,  i.e., 


J  =  {j  £  U  |  j  =  mate[i\  for  some  i  G  A} , 

A  =  {i  £  V  |  i  =  mate[j]  for  some  j  £  J}  . 

A  closed  subset  in  the  second  stage  is  a  pair  of  subsets  A  of  agents  and  J  of  jobs  such  that  all 
agent-job  assignments  remain  within  these  two  sets  across  all  scenarios  in  the  second-stage  myopic 
solutions,  i.e., 


J  =  {j  G  U  |  j  =  matek[i\  for  some  i  G  A  and  some  scenario  A’}  , 

A  =  {i  €  V  |  i  =  matek\j ]  for  some  j  £  J  and  some  scenario  A}  . 

Here  we  want  to  provide  some  details  about  how  EGA-I  finds  closed  subsets.  For  the  first-stage 
myopic  solution,  given  a  subset  of  agents  A,  we  simply  construct  J  from  the  mates  of  agents  in 
A.  For  the  second  stage,  EGA-I  starts  constructing  a  closed  subset  in  the  second  stage  with  empty 
sets  A  of  agents  and  J  of  jobs.  Given  the  second-stage  myopic  solution,  we  arbitrarily  choose  an 
agent  and  add  it  to  the  set  of  agents  A.  Then,  all  jobs  that  this  agent  is  assigned  to  across  various 
scenarios  are  added  to  the  set  of  jobs  J.  Notice  that  an  agent  may  be  assigned  to  the  same  job  in 
several  scenarios.  Then,  for  all  jobs  that  are  selected  in  the  previous  step,  the  algorithm  updates  A 
to  find  all  agents  that  jobs  from  J  are  assigned  to  across  all  scenarios.  The  algorithm  continues  in 
this  manner  until  both  sets  cease  to  change,  which  implies  that  a  closed  subset  is  constructed.  The 
whole  process  described  above  is  repeated  for  the  remaining  agents  and  jobs  until  a  partition  of  the 
set  of  agents  and  jobs  into  a  set  of  closed  subsets  is  obtained. 

If  A  and  J  correspond  to  one  of  the  closed  subsets  in  the  second  stage  found  by  our  algorithm, 
then  by  the  definition  of  a  closed  subset,  we  have  that  |A|  =  |  J|.  Moreover,  the  value  of  Wfi^A,  J] 
is  readily  available  and  is  calculated  using  the  second-stage  myopic  solutions.  After  finding  closed 
subsets  in  the  second  stage,  EGA-I  identifies  the  ones  that  satisfy  (line  7): 


+  W2[A,J\,  (178) 

i&A  j&J 

where  a*  and  f3j  are  dual  variables  associated  with  the  first-stage  myopic  solution  and  the  respective 
dual  problem  (165)-(166).  Clearly,  closed  subsets  that  satisfy  (178)  violate  the  necessary  optimality 
condition  since  the  left-hand  side  of  (178)  is  an  upper  bound  on  W\[A,  J],  We  use  the  dual  solution 
due  to  the  fact  that  the  pair  ( A ,  J)  is  not  necessarily  closed  in  the  first  stage  and  we  do  not  want 
to  solve  an  assignment  problem  to  find  IFi  [A,  J], 
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For  all  subsets  that  are  closed  in  the  second  stage  and  satisfy  (178),  EGA-I  updates  the  first-stage 
weights  (lines  6-6).  In  contrast  to  GAE-I,  updating  the  first-stage  weights  properly  turns  out  to  be 
a  more  difficult  task  when  we  have  more  than  one  agent-job  pair  to  consider.  Our  main  goal  is  to 
update  edge  weights  in  the  first  stage  for  each  closed  subset  of  the  second  stage  in  such  a  way  that: 
(1)  the  resulting  assignment  with  updated  weights  should  favor  the  sets  to  be  also  closed  in  the  first 
stage  and  (2)  if  the  set  becomes  also  closed  in  the  first  stage,  then  the  weight  of  the  assignment 
within  this  set  should  be  exactly  W^A,  J\. 

Next  we  discuss  in  detail  the  weight  update  procedure  for  a  pair  ( A,  J ).  First,  W\[A,J]  is 
computed  by  running  HM.  Define  E  to  be  the  set  of  all  agent-job  assignments  in  the  obtained 
solution.  Then  we  increase  the  weights  Wij  in  the  first  stage  only  for  pairs  (i,  j)  £  E  according  to 
the  following  procedure  (see  also  lines  12-16  of  Algorithm  1). 

•  Let  E  =  {ei, . . . ,  eM|}  be  sorted  in  non-decreasing  order  of  the  edge  weights. 

•  Find  w  £  M+  and  t  £  Z+  ,  t  <  |A|,  that  satisfy  the  following  conditions: 


t  t 


5 2(w-wer )  =  t ■ 

W-^2wer  =  A, 

(179) 

r— 1 

r— 1 

wet  <  w  < 

wet+ 1) 

(180) 

where  we  assume  that  we^+1  =  +oo. 

•  Set  the  weight  of  each  er  £  E,  1  <  r  <  t,  to  be  w. 

Let  W\  [A,  J]  be  the  weight  of  the  optimal  assignment  in  (A,  J)  after  the  weight  update  procedure 
described  above.  It  is  easy  to  observe  that  W\  [A.  J]  =  W\  [A,  J]  +  A  =  II-VfA,  J ]. 

As  an  example,  assume  that  {3, 10,25}  are  the  weights  of  the  assignment  in  E  (i.e.,  we  have  3 
agents  and  3  jobs  with  selected  assignment  edges  that  have  values  3,  10,  and  25).  Thus  Wj  [A.  J]  = 
38.  Let  W‘2[A,  J]  =  50.  Then,  A  =  12.  If  we  start  with  t  =  1,  then  by  (179),  we  get  w  =  15,  which 
violates  (180).  Incrementing  t ,  we  set  t  =  2  and  find  w  =  12.5,  which  satisfies  (180).  Thus  we  set 
wei  =  we2  =  12.5.  Now  we  have  IF} [A,  J]  =  12.5  +  12.5  +  25  =  50  =  1L2[A,  J],  Simply  speaking  we 
increase  the  weights  of  the  edges  with  the  smallest  weights  until  we  have  the  total  increase  of  A. 

Subsequently,  EGA-I  finds  the  first-stage  myopic  solution  with  the  updated  weights  (line  19). 
For  every  closed  subset  (A,  J),  we  check  whether  it  remains  closed  in  the  first-stage,  i.e.,  Vi  £  A  we 
have  that  mate[i]  £  J .  If  this  is  not  the  case,  we  restore  each  modified  assignment  weight  of  [A,  J] 
and  perform  one  iteration  of  HM  to  restore  optimality.  This  phase  ends  when  all  the  remaining  closed 
subsets  are  also  closed  in  the  first  stage.  Note  that  the  number  of  weights  restored  and  the  number 
of  HM  iterations  performed  are  bounded  by  n,  the  number  of  agents.  Next,  the  remaining  closed  sets 
are  moved  to  the  second  stage  (line  26).  Finally,  EGA-I  solves  K  deterministic  assignment  problems 
with  all  the  agents  and  jobs  moved  to  the  second  stage  (including  those  moved  after  line  1)  to  find 
the  second  stage  solution,  and  outputs  the  resulting  assignment. 

Lemma  2  Let  ZEGA  be  the  weight  of  the  assignment  returned  by  EGA-I.  Then 

ZfGA  >  ZfAE  >  ZGA  .  (181) 

Proof:  It  is  clear  that  the  solution  found  after  line  1  is  exactly  the  solution  found  by  GAE-I.  Next 
consider  the  first-stage  solution  and  edge  weights  after  line  21.  Let  {A,  J}  £  C  be  a  closed  subset 
that  is  not  removed  from  consideration  during  the  procedure  between  lines  20-21.  Any  assignment 
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edge  in  the  first  stage  that  does  not  belong  to  closed  subsets  from  C .  now  has  its  original  value  as 
it  was  reset  in  line  23.  Any  assignment  edge  weight  that  belongs  to  a  closed  subset  {A,  J}  £  C  is 
at  least  as  large  as  its  original  value  because  the  weight  update  mechanism  given  by  (179)-(180) 
only  increases  the  weights  of  the  edges.  By  construction,  since  { A ,  J}  €  C  is  also  closed  in  the  first 
stage,  the  total  weight  of  the  assignments  in  this  subset,  W\[A,  J](>  I'D i  [A,  J]),  is  exactly  equal  to 
ID2  [A,  J],  It  implies  that  there  exists  an  assignment  of  A  and  J  at  the  second  stage  with  the  same 
weight.  Hence,  the  weight  of  the  assignment  returned  after  line  21  is  at  least  as  large  as  weight  of 
the  assignment  returned  after  line  1.  The  necessary  result  follows.  □ 


6.4.2  Improving  the  second-stage  assignment  (EGA-II) 

EGA-II  starts  with  all  agent-job  assignments  made  in  the  second  stage  and  then  attempts  to  move 
some  of  them  to  the  first  stage  if  it  is  worth  doing  so.  Due  to  the  lack  of  control  to  preserve  closed 
subsets  after  each  weight  update  in  the  second  stage,  this  approach  is  more  sensitive  to  variations  in 
assignments  across  the  scenarios.  Using  the  necessary  optimality  condition,  EGA-II  tries  to  achieve 
dual  feasibility  of  the  model  given  by  (175)  and  (176).  First,  it  solves  the  optimization  problem  given 
by  the  objective  function  (174)  and  the  constraint  set  (176).  Since  the  constraints  are  separable  for 
the  scenarios,  we  use  HM  to  solve  the  assignment  problem  for  each  scenario  separately  (lines  2-3  of 
Algorithm  2).  Therefore,  initially  all  second-stage  constraints  of  the  form 

aik  +  Pjk  >  Qiji  Vi,  j,  k  (182) 

are  satisfied  as  the  obtained  assignments  are  the  second-stage  myopic  solutions.  Then  the  algorithm 
searches  for  the  pair  ( i',j ')  such  that 


=  argmax 
(hi) 


K 

^  (  (c^i/c  T  Pjk) 

k= 1 


Wj 


I\ 

-E 

k= 1 


(c^ifc  T  fijk)  0 


If  such  pair  ( i',j ')  does  not  exists,  then  the  algorithm  stops.  Otherwise,  it  implies  that  the  first-stage 
dual  constraint  j  (ai'k  +  Pj'k)  >  wl']'  j  is  violated.  Then  consider  slack  Sk  =  oii'k  +  Pj'k  — 

<f-,y  for  each  scenario  (line  8).  EGA-II  updates  the  weight  q^y  according  to  the  following  scheme: 


•  If  J2k  sk  >  0: 

•  =  0: 


(line  11) 


(line  14) 


The  intuition  behind  these  update  strategies  is  attempting  to  keep  agent  i'  assigned  to  job  j'  across 
all  scenarios  after  we  update  arc  weights  g*  Note  that  after  the  weight  update,  we  have 

I< 

Yl$y  =  Wi'y- 

k= 1 


Therefore,  any  labeling  feasible  for  the  second-stage  dual  constraints  |  a^k  +  Pj'k  >  q^y  V/c  j  is 
also  feasible  for  the  respective  first-stage  dual  constraint. 
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Algorithm  2:  EGA-II 


Input:  n  agents,  n  jobs,  K  scenarios,  Wij ,  .  pk- 

1  Let  qf3  =  PkQij  and  Gk  be  the  graph  for  the  kth  scenario  in  the  second-stage 

2  foreach  Scenario  k  do 

3  |_  Run  Hungarian  Method  on  Gk  to  find  an  assignment  for  the  kth  scenario. 

4  while  There  is  an  i  and  j  such  that  [w^  >  '}2k  (aik  +  Pjk)]  do 

5  Let  (?:',/)  =  argrnax  {wtJ  -  ( aik  +  Pjk)} 

6  Add  to  set  R 

r  begin  Update  assignments  on  Gk  V  k 


Let  sk  =  ai'k  +  Pj'k  ~ 

if  Sk  >  0  then 
i  foreach  Scenario  k  do 


//  Slack  for  the  £r!  scenario 


_  Qi'j'  ■=  Q-i'j'  +  {wi’j'  ~  Ek  Qi'f)  (: 


else 

I  foreach  Scenario  k  do 


14  [  L^',:=Wj'v^y 

15  foreach  Scenario  k  do 

16  Remove  edge  ( mate  [ j ']  ,j)  from  assignment  on  Gk 

17  flj'k  =  max,  j(/^;  - 

is  Perform  one  iteration  of  Hungarian  Method  on  Gk 

19  begin  Reset  all  Gk 

20  while  There  is  a  pair  (i,j)  G  R  such  that  mate  [*]  is  not  j  V  k  do 

21  foreach  Scenario  k  do 

22  Reset  edge  weight  q^  to  its  original  value 

23  if  mate[i\  =  j  then 

24  Remove  assignment  ( i,j )  from  Gk 

25  Perform  one  iteration  of  Hungarian  Method  on  Gk 

26  Remove  the  pair  ( i,j )  from  R 

27  Let  Go  be  the  graph  for  the  first-stage.  Move  all  pairs  ( i,j )  G  R  from  Gk,  V  k,  to  Gq 

28  Run  Hungarian  Method  on  Go 

29  Z2  1  =  JJieGo  wi,mate[i]  +  Ylk  YlieGk  Qi,matek[i] 

30  return  Z2GA,  Go,  and  Gk  V  k 


Then,  for  each  scenario,  we  remove  assignment  between  job  j'  and  its  mate  (line  16)  and  update 
dual  variable  fij’k  (line  17),  which  is  necessary  to  keep  the  respective  constraints  (176)  satisfied. 
Consequently,  for  each  scenario,  we  lack  only  one  agent-job  assignment  and  the  current  labeling, 
i.e. ,  the  values  of  dual  variables  (a,  (3),  is  feasible  for  (176).  Thus,  one  iteration  of  HM  (line  18)  is 
sufficient  to  achieve  an  optimal  labeling  for  updated  weights  qk-,  for  each  scenario.  This  procedure 
(lines  4-18)  is  performed  until  for  every  pair  (i,j)  we  have 

I< 

X>ik  +  &fc)  >  Wij,  Vi,  j  .  (183) 

fc=i 

Therefore,  the  following  result  holds. 

Proposition  5  Let  A  and  J  be  a  pair  of  subsets  of  agents  and  jobs,  respectively,  such  that  |A|  =  |J|. 
Then  after  line  18  of  Algorithm  2,  we  have  that 


E 


^  ^  T  ^  ^  fdjk 

ieA  j£j 


>  WX[A,J}. 


(184) 


Proof:  Follows  directly  from  (183).  □ 

This  result  implies  that  after  the  above  described  procedure  (lines  4-18),  the  obtained  assignment 
satisfies  the  necessary  optimality  condition  for  all  closed  subsets  in  the  second  stage.  Therefore,  con¬ 
trary  to  the  case  for  EGA- 1,  EGA- 1 1  does  not  check  subsets  with  cardinality  strictly  greater  than  one. 

Now,  consider  an  assignment  ( i,matek[i\ )  in  the  kth  scenario.  If  the  weight  m  is  an 

updated  weight,  and  we  have  the  same  assignment  in  all  scenarios,  i.e.,  (i,matek[i\)  is  a  closed 
subset  in  the  second  stage,  we  can  move  this  assignment  to  the  first  stage  without  changing  the 
total  weight  of  the  current  assignments  and  without  affecting  other  assignments.  If  we  can  do  this 
for  all  such  pairs,  then  we  have  an  optimal  solution  due  to  strong  duality.  However,  it  may  not 
be  the  case  that  each  time  we  have  the  same  assignment  ( i,mate[i ])  in  all  scenarios  as  the  original 
problem  is  iVP-hard.  Therefore,  moving  this  assignment  to  the  first  stage  will  change  assignments 
in  scenarios  where  we  do  not  have  the  assignment  (i,  mate[i\). 

On  the  other  hand,  keeping  assignment  ( i,mate[i\ ),  for  the  subsets  that  are  not  closed  in  the 
second  stage  indicates  that  we  have  an  updated  arc  weight  which  actually  does  not  exist  and  we  can 
not  find  an  assignment  corresponding  to  it  (i.e.,  primal  infeasible).  EGA- 1 1  decreases  the  weight  of 
the  assignment  for  such  pairs  to  their  original  values  (line  22)  and  updates  the  assignments  across 
all  scenarios  (lines  23-25)  to  accommodate  this  change.  Finally,  the  remaining  agent-job  pairs  with 
updated  weights  are  moved  to  the  first  stage  and  a  separate  assignment  problem  is  solved  for  them 
(lines  27-28). 

Lemma  3  Let  Z^GA  be  the  weight  of  the  assignment  returned  by  EGA-II.  Then 

Z%GA  >  ZffA  .  (185) 

Proof:  Denote  by  (a,  (3)  the  dual  second-stage  myopic  solutions.  Consider  labeling  (d,  (3 )  obtained 
after  line  26  of  Algorithm  2.  Since  the  weight  updates  only  increase  the  arc  weights,  we  have 

K  n  K  n  K  n  K  n 

k= 1  2=1  k= 1  j= 1  k= 1  2=1  k= 1  j= 1 

Observe  that  the  procedure  after  line  26  can  only  potentially  improve  the  weight  of  the  final 
assignment  returned  by  EGA-II,  which  concludes  the  proof.  □ 
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Theorem  6  Approximation  bounds  given  for  GAE  in  Theorems  4  and  5  are  valid  for  EGA,  and  its 
solution  satisfies 

2?ega  >  r?GAE  >  gGA 

Proof:  The  necessary  result  directly  follows  from  Lemmas  2  and  3.  □ 

Proposition  6  Time  complexity  of  EGA  is  0(Kha. 

Proof:  First  we  consider  complexity  of  GAE-I.  It  is  easy  to  observe  that  lines  1-4  takes  0(Kn 3) 
time.  Next  step  is  to  find  closed  subsets.  There  can  be  at  most  n  closed  subsets  and  0(I\n)  time 
is  required  in  the  worst  case  to  construct  each  of  them.  Thus,  in  total  O (Kn2)  time  is  required 
to  construct  all  closed  subsets.  Next,  solving  the  assignment  problem  for  a  closed  subset  (line  10) 
would  take  0(|^4| |v4|2)  which  is  equal  to  0(\A\n2).  Since  Y1{aj}&c  l-^l  —  n>  the  total  complexity 
of  solving  the  assignment  problems  for  all  closed  subsets  would  take  0{n3)  time.  The  complexity 
of  sorting  in  line  12  is  0(|A|  lg  |A|)  and  similar  to  our  previous  argument,  total  time  complexity 
of  sorting  would  be  O(nlgn).  If  one  starts  with  t  =  1  and  tries  each  index  sequentially,  the  total 
time  complexity  of  the  lines  13-16  for  all  closed  subsets  would  be  0(n).  Consequently,  complexity 
of  updating  the  weights  is  of  0(n3).  At  most  n  iterations  of  HM  is  performed  between  lines  20-21 
and  thus  the  resetting  procedure  is  of  0(n3)  time  complexity.  Finally  solving  K  +  1  assignment 
problems  (line  28)  has  a  time  complexity  of  0(Kn 3).  Consequently,  the  total  time  complexity  of 
EGA- 1  is  O  (I<n3). 

The  most  time  consuming  procedure  for  EGA- 1 1  is  updating  weights  and  assignments  (lines  4-4). 
Since  there  are  n 2  first-stage  dual  constraints,  the  outer  loop  requires  0(n2)  operations.  The  inner 
loop  is  to  increase  cardinality  of  assignments  by  1  across  all  scenarios,  in  the  worst  case.  Since 
each  stage  of  HM  requires  0(n2)  time,  the  inner  loop  requires  at  most  O (Kn2)  time.  Thus,  the  time 
complexity  of  EGA-II  is  O (An4).  This  completes  the  proof.  □ 

6.4.3  Improving  EGA  with  Local  Search 

In  this  section  we  introduce  a  greedy  local  exchange  based  heuristic  that  seeks  to  further  improve  the 
results  obtained  by  EGA-I  and  EGA-II.  Let  (A',  Y)  =  (x,  yi,  y?,...,  yx)  be  a  feasible  assignment  for 
the  2SSLA  problem.  Here,  we  distinguish  between  an  assignment  (i,  j)  and  a  pair  [i ,  j].  The  former 
one  indicates  that  agent  i  is  assigned  to  job  j  whereas  for  the  latter  one  we  do  not  imply  any  de¬ 
pendence.  We  say  that  pair  [z,  j]  belongs  to  the  partial  solution  X  (or  Y)  if  assignments  (i,mate[i\) 
and  (mate[j],  j)  are  at  the  first  stage  (or  second  stage).  Define  the  following  neighborhoods  for  a 
given  solution  (X,Y): 

•  Neighborhood  N\  for  solution  (X,  Y)  is  defined  to  be  the  set  of  all  solutions  obtained  by 
moving  any  pair  [i,j]  from  X  to  Y.  This  implicitly  requires  [i,j\  £  X.  Thus,  to  maintain 
feasibility,  if  a  solution  (X,Y)  £  Ni(X,Y),  then  we  have  ( ?nate[j],mate[i ])  £  X  and  (■ i,j )  £  Y, 
assuming  that  (X,  Y)  is  obtained  from  (A,  Y)  with  respect  to  pair  [i,j]-  This  exchange  process 
is  illustrated  in  Figure  9. 

•  Neighborhood  N-2  for  solution  (X,  Y)  is  defined  to  be  the  set  of  all  solutions  obtained  by  moving 
any  pair  [i,j]  from  Y  to  X.  This  implicitly  requires  [i,j\  £  Y.  Thus,  to  maintain  feasibility, 
if  a  solution  (X,Y)  £  N2(X,Y),  then  we  have  (i,j)  £  X  and  (mate\j\,mate[i\)  £  Y.  This 
exchange  process  is  illustrated  in  Figure  10. 

Proposition  7  The  solutions  obtained  by  GAE  and  EGA-I  are  locally  optimal  with  respect  to  the 
neighborhood  N\ . 
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Figure  9:  The  neighborhood  N±. 


Figure  10:  The  neighborhood  IV2. 


Proof:  Let  (X,Y)  be  the  solution  returned  after  line  1  of  Algorithm  1.  Consider  a  pair  [i,j]  G  X. 
We  check  whether  there  is  a  better  solution  (X,Y)  G  Ni(X,Y)  with  respect  to  [i,  j]  as  follows: 

K 

^  '  Qij  ^  ^i,mate[i]  T  ^mate[j],j  ^mate[j],mate[i\-  (186) 

k= 1 
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Assume  that  [i,  j]  satisfies  (186)  and  consider  the  respective  dual  solution  for  X.  Then  we  know 
that  the  constraints  (166)  are  tight  for  {i,mate\i])  and  (mate\j\,  j): 


T  fimate\i\  ^i,mate\i]  >  ® mate[j ]  Pj  W mate\j],ji  and  OLmate[j)  T  Pmate[i]  —  ^mate\j],mate[i 


Then,  from  (186),  we  have 


I< 

E4 

k= 1 


> 


^ mate  [j  ]  .mate  [i] 


—  (®i  T  Ornate [/]  )  T  (/-^rnote\j]  T  dj )  (®ma4e[j]  T  /^matefi]) 

=  Ckj  +  fij . 


However,  this  is  not  possible  because  the  necessary  optimality  condition  for  sets  with  unit  cardinality 
implies  that  a*  +  (3j  >  Wij  >  4  due  to  the  update  in  (177).  Therefore,  the  necessary  result 

follows.  □ 

Next  consider  EGA-II.  Let  (X,  Y)  be  the  solution  obtained  by  EGA-II  and  let  [i,j\  €  Y.  We  check 
whether  switching  to  a  solution  in  the  neighborhood  N-2  of  (X,Y)  with  respect  to  [i,  j]  improves 
our  current  solution.  Formally,  we  verify  whether 

K 

^  y  (^i,mate[i ]  Qmate \j],j  Qmate\j],mate[i]^  '  (187) 

k= 1 

If  (187)  is  satisfied,  then  removing  pair  [i,j]  for  all  scenarios  in  the  second  stage  and  assigning  i  to 
j  in  the  first  stage  improves  our  solution.  This  process  is  illustrated  in  Figure  10.  However,  we  may 
further  improve  our  new  solution  by  running  one  iteration  of  Hungarian  Method  for  the  first  stage 
and  K  iterations  of  Hungarian  Method  for  the  second  stage.  Since  each  iteration  of  Hungarian 
Method  requires  0(n2)  time,  this  update  requires  0(Kn2)  time.  Furthermore,  at  most  n  pairs  may 
be  moved  to  the  first  stage  which  results  in  a  total  time  complexity  of  0(iLn3)  for  local  search  after 
EGA-II. 


6.4.4  Analytical  Observations 

Next,  we  discuss  performance  of  GA,  GAE,  and  EGA  on  two  carefully  constructed  classes  of  test  in¬ 
stances.  Assume  that  in  both  classes,  we  have  2 n  agents,  2 n  jobs,  and  K  scenarios,  K  <  n.  We 
partition  the  set  of  all  agents  and  jobs  into  two  groups:  G\  and  G2 ,  where  G\  contains  the  first  n 
agents  and  n  jobs  and  G2  contains  remaining  n  agents  and  n  jobs.  Now  we  describe  two  types  of 
instances. 


Split  Instances:  We  construct  this  type  of  instances  as  follows: 


f  1 

for 

(i,j)  6  G 1  , 

Wij  =  <  2  K 

for 

(i,j)  <E  G2  and  1  =  j  , 

l  0 

o/w. 

f  K 

for 

(i,j)  G  G\  and  i  +  k  —  1  =  j  mod  (n)  , 

4  =  K 

for 

(hj)  ^  G2  , 

l  0 

o/w. 

l/K,  k=  1, . 

..,I<  . 

The  structure  of  these  instances  is  illustrated  in  Figure  11.  It  is 

optimal  to  make  assignments  for  G2  in  the  first  stage  while  for  G\  it  is  optimal  to  make  assignments 
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Figure  11:  A  split  instance  for  K  =  n  =  2  (only  nonzero  arcs  are  shown).  Thick  lines  show  first-stage 
and  second-stage  myopic  solutions. 


in  the  second  stage.  Therefore,  the  total  weight  of  the  optimal  assignment  is  3 nK. 


Interleaved  Instances:  We  construct  this  type  of  instances  as  follows: 


f 

for  (i,j)eG  i, 

Wij  =  < 

2  K 

for  (i,j)  £  G 2  and  i  =  j  , 

\ 

0 

o/w. 

'  K 

for  (i.  j)  £  G\  and  i  +  k  —  1  =  j  mod 

(«)  , 

k 

I< 

for  i  £  G\ ,  j  £  G2 ,  and  i  +  k  —  1  =  j  — 

n  mod  (n)  , 

Qij  ~ 

I< 

for  i  £  G2 ,  j  £  G 1 ,  and  i  +  k  —  1  =  j 

mod  (n)  , 

.  o 

o/w. 

l/K,  k 

=  1, 

. . ,  K.  The  structure  of  these  instances  is 

illustrated  in  Figure 

The 

optimal  solution  should  have  all  assignments  within  G±  in  the  second  stage  and  all  assignments 
within  G2  in  the  first  stage,  with  the  total  weight  of  3 nK. 


Proposition  8  The  weights  of  the  assignments  obtained  by  GA,  GAE ,  and  EGA  for  split  instances  are 
n(2K  +  1),  n(2K  +  1),  and  3nK,  respectively. 

Proof:  It  is  easy  to  check  that  GA  would  make  all  assignments  in  the  first  stage  and  the  total 
weight  of  this  assignment  would  be  n(2I\  +  1).  Next  we  consider  GAE.  From  Figure  11,  it  is  clear 
that  the  first-stage  myopic  solution  satisfies  the  necessary  optimality  condition  for  the  sets  of  unit 
cardinality.  Thus,  solution  returned  by  GAE  is  the  same  as  the  solution  returned  by  GA. 

Finally,  we  consider  EGA.  Notice  that  the  total  weight  of  the  assignments  within  G\  in  the 
first  stage  is  n  whereas  the  expected  total  weight  of  assignments  within  G\  in  the  second  stage  is 
nK.  Thus,  the  first-stage  myopic  solution  violates  the  necessary  optimality  condition.  Since  in  the 
second-stage  myopic  solution,  G±  will  be  a  closed  subset,  EGA-I  would  move  all  assignments  to  the 
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Figure  12:  An  interleaved  instance  for  I\  =  n  =  2  (only  nonzero  arcs  are  shown).  Thick  lines  show 
first-stage  and  second-stage  myopic  solutions. 


second  stage.  Therefore,  the  weight  of  the  assignment  returned  by  EGA-I  is  at  least  3 nK.  Since 
the  weight  of  the  optimal  assignment  for  ‘split’  instances  is  3 nK  and  EGA  returns  the  best  of  EGA-I 
and  EGA-I I.  EGA  finds  the  optimal  solution.  □ 

Proposition  9  The  weights  of  the  assignments  obtained  by  GA,  GAE,  and  EGA  for  interleaved  in¬ 
stances  are  n(2K  +  1),  n(2K  +  1),  and  3nK,  respectively. 

Proof:  Similar  to  the  proof  of  Proposition  8.  □ 

We  want  to  conclude  this  section  by  emphasizing  the  importance  of  the  constructed  problem 
instances  as  they  demonstrate  that  both  GA  and  GAE  can  give  results  significantly  away  from  optimal, 
while  EGA  returns  the  optimal  solution  for  both  of  these  classes  of  test  instances. 

6.5  Computational  Experiments 
6.5.1  Setup 

Five  classes  of  test  instances  are  used  in  our  computational  experiments.  The  first  two  classes  are 
similar  to  the  ones  used  in  [52],  which  allow  us  to  provide  an  unbiased  comparison  of  GAE  and  EGA. 

Uncorrelated  Instances:  All  the  edge  weights  are  drawn  from  IV(10, 15),  the  normal  distri¬ 
bution  with  mean  10  and  standard  deviation  15. 

Wij  ~  IV(10, 15),  Vi,  j. 

Qij  ~  JV(10, 15),  Vi,  j,  k. 

If  the  generated  weight  is  negative,  then  it  is  set  to  zero.  All  scenarios  have  the  same  probability. 
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Correlated  Instances:  For  these  instances,  the  second  stage  weights  are  correlated. 


Wij  ~  iV(10, 15),  Vi,  j. 

Qij  ~  N(10, 15),  Vi,  j. 

Qij  ~  Qij  +N(0,5),  Vi,j,k. 

If  the  generated  weight  is  negative,  then  it  is  set  to  zero.  All  scenarios  have  the  same  probability. 

The  intuition  behind  the  next  class  of  test  instances  is  to  have  the  necessary  optimality  condition 
satisfied  for  nearly  all  unit  cardinality  sets,  but  possibly  violated  for  subsets  with  two  agents  and 
two  jobs.  As  it  is  demonstrated  later,  both  GA  and  GAE  fail  to  identify  such  rather  simple  weight 
dependencies. 

Pairwise-Correlated  Instances:  Unlike  correlated  instances,  the  correlation  in  these  in¬ 
stances  is  not  just  on  a  single  agent  and  job,  but  on  pairs  of  agents  and  jobs. 

^(200,40)  for  i  =  0  mod  (3), 

_  1V(140,  30)  for  i  =  1  mod  (3)  and  i  <  j  <  i  +  1, 

Wl]  1V(140,  30)  for  %  =  2  mod  (3)  and  i  —  1  <  j  <  i, 

A(10, 15)  o/w. 

1V(200, 40)  for  k  =  0  mod  (2),  i  =  1  mod  (3),  and  j  =  i, 

1V(200, 40)  for  k  =  1  mod  (2),  i  =  1  mod  (3),  and  j  =  i  +  1, 

qfj  =  <  1V(200,40)  for  k  =  0  mod  (2),  i  =  2  mod  (3),  and  j  =  i, 

1V(200,40)  for  k  =  1  mod  (2),  i  =  2  mod  (3),  and  j  =  i  —  1, 

1V(10, 15)  o/w. 

Next  we  provide  details  for  split-like  and  interleaved-like  instances.  Both  classes  are  based  on 
the  instances  introduced  in  Section  6.4.4  with  modifications  that  are  aimed  at  “randomizing”  their 
structures.  In  particular,  we  add  a  third  class  of  agents  and  jobs  to  the  ‘split’  and  ‘interleaved’ 
instances  with  uniformly  generated  assignment  weights.  Thus,  we  have  3n  agents,  3 n  jobs,  and  K 
scenarios,  K  <  n.  Let  G\,  GV,  and  G3  be  the  sets  of  first,  second,  and  third  n  agents  and  jobs, 
respectively. 

Split-like  Instances:  We  let 


U[900/K,  1000/IV] 

for 

C/[500, 1000] 

for 

(bj)  €  G2 

U[100, 1000] 

for 

(bj)  €  G3 

u[  2, 10] 

o/w. 

C/[800,  900]  for  (i,  j)  £  G\  and  i  +  k  —  1  =  j  mod  (n) 

u  [/ [100, 500]  for  (i.  j)  £  G2  , 

Qij  ~  C/[100,1000]  for  (i,j)  £  G3  , 

U[  2,10]  o/w. 

Interleaved-like  Instances:  We  let 

'  U[900/A,  1000/AT]  for  (i,j)  £  G\  , 

_  U [4000,  5000]  for  (i.  j)  £  G2  and  i  =  j  , 

Wij  ~  U[100,1000]  for  (■ i,j)£G3  , 

U[2, 10]  o/w. 
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C/[800, 900]  for  (i.  j)  £  G\  and  i  +  k  —  1  =  j  mod  (n)  , 

t/[1500,  2000]  for  i  £  G\,  j  £  G2,  and  i  +  k  —  1=  j  —  n  mod  (n)  , 

q^j  =  <  t/[1500,  2000]  for  i  £  G2,  j  £  G\ .  and  i  +  k  —  1  =  j  mod  (n)  , 

[100, 1000]  for  (i,j)  £  G:s  , 

U[2, 10]  o/w. 

The  probability  for  each  scenario  is  set  to  be  1  /K  for  each  instance  from  either  class. 

In  our  computational  experiments,  we  use  CPLEX  12.2  [82],  The  algorithms  are  coded  in  C++ 
and  implemented  on  a  Windows  XP  based  machine  with  Intel  Xeon  3  GHz  processor  and  3GB  RAM. 
In  our  experiments  we  consider  problems  with  2,  3,  5,  10,  and  20  scenarios  and  10,  20,  50,  100,  and 
200  agents/jobs.  For  each  of  these  25  configurations,  we  conduct  10  replications  and  report  their 
averages. 

6.5.2  Results  and  Discussion 

We  report  statistics  for  CPLEX  solver  as  well  as  GA,  GAE,  and  EGA  algorithms.  The  first  two  columns, 
as  can  be  seen  in  Table  4,  are  self  explanatory.  We  provide  average  running  time  (in  seconds)  for 
CPLEX  in  Cplex  T  column.  We  have  enforced  a  time  limit  of  3600  seconds  on  CPLEX  and  an 
average  running  time  of  3600  seconds  in  this  column  implies  that  CPLEX  is  unable  to  solve  all 
IP  formulations  directly  to  optimality.  In  such  cases,  when  an  integral  solution  is  not  available, 
we  use  the  LP  relaxation  solution  for  comparison  purpose.  The  next  three  columns  report  the 
percentage  deviation  from  the  CPLEX  solution  for  GA-I,  GA-II,  and  the  overall  running  time  for 
GA  in  seconds.  Next  we  provide  the  percentage  deviation  of  the  solution  returned  by  GAE-I  from 
the  CPLEX  solution.  Since  GAE- 1 1  is  the  same  as  GA-II.  we  do  not  provide  such  information. 
However,  we  report  the  combined  time  of  GAE-I  and  GAE- 1 1  under  the  GAE  T  column.  The  next 
three  columns  are  results  obtained  by  EGA.  Finally,  we  provide  results  for  EGA-II  with  local  search 
and  the  time  spent  for  the  local  search  procedure,  excluding  the  time  for  obtaining  EGA-II  solution 
prior  to  the  local  search  procedure.  We  have  marked  the  best  results  in  bold  in  all  tables. 

Results  for  the  uncorrelated  instances  are  given  in  Table  4.  It  can  be  observed  that  CPLEX  has 
difficulty  in  solving  large  instances  whereas  the  running  time  does  not  exceed  5  seconds  for  all  other 
algorithms.  As  expected,  EGA  finds  the  best  results  in  all  cases.  EGA  solution  yields  a  significant 
improvement  over  GA  solution  and  is  reasonably  better  than  GAE  solution.  EGA-II  (with  and  without 
local  search)  is  successful  in  improving  the  second-stage  myopic  solution. 

Table  5  summarizes  results  for  the  correlated  instances.  Both  EGA- 1  and  EGA-II  find  nearly 
optimal  solutions  and  local  search  further  improves  the  solution  of  EGA-II.  GAE-I  also  performs 
very  well  on  these  instances,  which  is  expected  due  to  the  structure  of  these  test  instances.  Note 
that  if  an  edge  has  a  large  weight  in  one  scenario,  then  it  should  have  a  large  weight  in  all  scenarios. 
Thus,  most  of  the  time,  it  is  better  to  make  assignments  for  the  agents  and  jobs  incident  to  such 
edges  in  the  second  stage.  Since  GAE  checks  the  necessary  optimality  condition  for  subsets  of  unit 
cardinality,  its  success  on  these  instances  is  expected. 

Results  for  the  pairwise- correlated  instances  are  summarized  in  Table  6.  Both  GA  and  GAE  per¬ 
form  rather  poorly.  On  the  other  hand,  EGA  (especially  EGA-I)  is  successful  in  detecting  correlation 
between  pairs  of  agents  and  jobs.  Since  correlation  is  between  pairs  of  jobs  but  not  larger  subsets, 
local  search  is  also  successful  and  is  able  to  find  an  optimal  solution  in  almost  all  cases. 

Results  for  the  split-like  instances  are  summarized  in  Table  7.  One  can  observe  that  CPLEX 
runs  out  of  time  for  larger  instances.  Solutions  found  by  GA  and  GAE  are  poor.  In  fact,  GA  and  GAE 
find  almost  the  same  solutions.  On  the  other  hand,  solution  of  EGA-I  is  only  1%  worse  than  the 
CPLEX  solution  on  average.  EGA-II  performs  rather  poorly  for  large  instances;  however  the  local 
search  procedure  is  able  to  eliminate  this  deviation  as  shown  in  EGA-II  LS.  The  reason  that  EGA-I 
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Table  4:  Results  for  uncorrelated  instances. 


Scenarios 

Agents 

Cplex  T 

GA-I 

GA-II 

GA  T 

GAE-I 

GAE  T 

EGA-I 

EGA-II 

EGA  T 

EGA-II  LS 

EGA-II  LS  T 

2 

10 

0 

9.60 

10.24 

0 

2.00 

0 

2.00 

6.36 

0 

3.72 

0 

20 

0 

10.03 

9.15 

0 

5.33 

0 

5.33 

4.36 

0 

2.71 

0 

50 

0 

6.88 

6.95 

0 

4.55 

0 

4.55 

3.88 

0 

3.22 

0 

100 

1 

6.24 

6.28 

0 

4.80 

0 

4.80 

3.23 

0 

3.08 

0 

200 

10 

5.16 

5.36 

0 

4.53 

0 

4.50 

2.70 

0 

2.67 

0 

3 

10 

0 

9.33 

8.84 

0 

3.48 

0 

3.48 

6.83 

0 

3.84 

0 

20 

0 

6.87 

8.05 

0 

3.77 

0 

3.77 

6.15 

0 

4.96 

0 

50 

0 

6.34 

6.65 

0 

5.65 

0 

5.65 

5.42 

0 

4.79 

0 

100 

7 

5.37 

5.10 

0 

5.03 

0 

4.83 

3.65 

0 

3.42 

0 

200 

325 

4.65 

4.74 

0 

4.55 

0 

4.21 

3.13 

0 

3.09 

0 

5 

10 

0 

12.11 

8.82 

0 

6.85 

0 

5.68 

8.51 

0 

2.63 

0 

20 

0 

7.85 

6.74 

0 

6.63 

0 

5.71 

6.55 

0 

5.48 

0 

50 

1 

5.25 

5.53 

0 

5.10 

0 

3.97 

4.66 

0 

4.40 

0 

100 

119 

4.22 

4.10 

0 

4.20 

0 

3.51 

3.55 

0 

3.36 

0 

200 

2113 

3.98 

3.68 

0 

3.98 

0 

3.51 

3.06 

2 

3.02 

0 

10 

10 

0 

8.46 

8.16 

0 

6.21 

0 

5.65 

7.51 

0 

2.53 

0 

20 

0 

4.42 

6.56 

0 

4.26 

0 

3.96 

6.56 

0 

4.81 

0 

50 

26 

4.63 

4.81 

0 

4.58 

0 

3.74 

4.40 

0 

4.07 

0 

100 

1536 

4.26 

3.58 

0 

4.26 

0 

3.31 

3.41 

0 

3.40 

0 

200 

3600 

3.73 

3.49 

0 

3.73 

0 

3.40 

2.93 

3 

2.82 

0 

20 

10 

0 

4.47 

9.87 

0 

2.34 

0 

2.34 

9.87 

0 

4.39 

0 

20 

0 

5.86 

4.39 

0 

5.47 

0 

4.50 

4.39 

0 

3.92 

0 

50 

96 

4.42 

4.03 

0 

4.42 

0 

3.31 

4.03 

0 

3.57 

0 

100 

2391 

3.81 

3.80 

0 

3.81 

0 

3.32 

3.71 

0 

3.71 

0 

200 

3600 

0.43 

0.76 

1 

0.43 

1 

0.23 

0.52 

5 

0.20 

0 

Table  5:  Results  for  correlated  instances. 


Scenarios 

Agents 

Cplex  T 

GA-I 

GA-II 

GA  T 

GAE-I 

GAE  T 

EGA-I 

EGA-II 

EGA  T 

EGA-II  LS 

EGA-II  LS  T 

2 

10 

0 

6.89 

2.49 

0 

0.32 

0 

0.19 

0.13 

0 

0.09 

0 

20 

0 

4.65 

1.90 

0 

0.56 

0 

0.55 

0.13 

0 

0.12 

0 

50 

0 

4.98 

1.18 

0 

0.61 

0 

0.60 

0.23 

0 

0.20 

0 

100 

0 

4.80 

1.05 

0 

0.73 

0 

0.72 

0.15 

0 

0.13 

0 

200 

2 

4.76 

0.82 

0 

0.81 

0 

0.80 

0.16 

0 

0.15 

0 

3 

10 

0 

5.65 

1.70 

0 

0.73 

0 

0.31 

0.18 

0 

0.15 

0 

20 

0 

4.86 

1.24 

0 

0.69 

0 

0.57 

0.19 

0 

0.17 

0 

50 

0 

4.78 

0.67 

0 

1.00 

0 

1.00 

0.15 

0 

0.15 

0 

100 

0 

4.54 

0.66 

0 

1.18 

0 

1.18 

0.22 

0 

0.21 

0 

200 

3 

4.64 

0.46 

0 

1.22 

0 

1.22 

0.15 

0 

0.14 

0 

5 

10 

0 

4.57 

1.06 

0 

1.13 

0 

0.95 

0.41 

0 

0.27 

0 

20 

0 

4.20 

0.65 

0 

1.20 

0 

0.87 

0.16 

0 

0.15 

0 

50 

0 

4.27 

0.45 

0 

1.43 

0 

1.43 

0.19 

0 

0.15 

0 

100 

0 

4.60 

0.32 

0 

1.61 

0 

1.59 

0.15 

0 

0.14 

0 

200 

5 

4.48 

0.24 

0 

1.63 

0 

1.63 

0.09 

0 

0.09 

0 

10 

10 

0 

4.26 

0.50 

0 

1.17 

0 

1.17 

0.14 

0 

0.13 

0 

20 

0 

4.12 

0.46 

0 

1.84 

0 

1.42 

0.28 

0 

0.20 

0 

50 

0 

4.05 

0.25 

0 

1.97 

0 

1.97 

0.13 

0 

0.11 

0 

100 

1 

4.23 

0.20 

0 

2.02 

0 

1.82 

0.10 

0 

0.08 

0 

200 

10 

4.22 

0.14 

0 

2.03 

0 

1.38 

0.07 

1 

0.07 

0 

20 

10 

0 

3.44 

0.28 

0 

1.71 

0 

0.80 

0.10 

0 

0.07 

0 

20 

0 

3.90 

0.19 

0 

1.91 

0 

1.18 

0.10 

0 

0.07 

0 

50 

0 

4.07 

0.10 

0 

2.21 

0 

1.33 

0.06 

0 

0.05 

0 

100 

3 

4.00 

0.11 

0 

2.23 

0 

1.51 

0.07 

0 

0.06 

0 

200 

2526 

4.12 

0.06 

1 

2.39 

1 

0.52 

0.04 

2 

0.04 

0 

Table  6:  Results  for  pairwise-correlated  instances. 


Scenarios 

Agents 

Cplex  T 

GA-I 

GA-II 

GA  T 

GAE-I 

GAE  T 

EGA-I 

EGA-II 

EGA  T 

EGA-II  LS 

EGA-II  LS  T 

2 

10 

0 

11.35 

36.30 

0 

10.91 

0 

3.43 

0.00 

0 

0.00 

0 

20 

0 

14.14 

31.56 

0 

13.29 

0 

3.04 

0.44 

0 

0.00 

0 

50 

0 

15.31 

28.55 

0 

14.51 

0 

4.21 

1.16 

0 

0.00 

0 

100 

0 

16.34 

27.45 

0 

15.13 

0 

4.55 

0.42 

0 

0.00 

0 

200 

1 

16.38 

26.36 

0 

14.92 

0 

4.49 

0.63 

0 

0.00 

0 

3 

10 

0 

13.87 

35.07 

0 

9.62 

0 

3.29 

0.00 

0 

0.00 

0 

20 

0 

14.65 

30.45 

0 

11.18 

0 

4.46 

3.48 

0 

0.00 

0 

50 

0 

14.66 

28.54 

0 

10.55 

0 

3.56 

4.02 

0 

0.00 

0 

100 

0 

16.08 

27.41 

0 

11.20 

0 

5.53 

2.37 

0 

0.00 

0 

200 

1 

15.66 

26.12 

0 

11.04 

0 

4.88 

0.60 

0 

0.00 

0 

5 

10 

0 

14.57 

35.08 

0 

13.72 

0 

5.54 

0.00 

0 

0.00 

0 

20 

0 

14.87 

30.51 

0 

12.86 

0 

3.75 

5.23 

0 

0.00 

0 

50 

0 

15.46 

28.32 

0 

14.22 

0 

4.12 

3.65 

0 

0.00 

0 

100 

0 

15.42 

27.57 

0 

13.88 

0 

4.19 

3.84 

0 

0.01 

0 

200 

2 

15.97 

26.31 

0 

13.83 

0 

4.25 

4.76 

0 

0.00 

0 

10 

10 

0 

14.87 

36.65 

0 

14.71 

0 

2.32 

0.00 

0 

0.00 

0 

20 

0 

13.14 

30.83 

0 

12.96 

0 

0.97 

14.76 

0 

0.00 

0 

50 

0 

14.82 

28.53 

0 

14.57 

0 

2.26 

18.68 

0 

0.00 

0 

100 

1 

15.72 

27.79 

0 

15.29 

0 

2.85 

9.22 

0 

0.00 

0 

5 

15.60 

26.66 

0 

15.16 

0 

2.41 

8.70 

1 

0.01 

0 

20 

10 

0 

14.87 

35.35 

0 

14.70 

0 

2.99 

3.92 

0 

0.00 

0 

20 

0 

16.04 

30.10 

0 

15.48 

0 

4.28 

8.80 

0 

0.00 

0 

50 

0 

15.67 

28.71 

0 

15.00 

0 

3.25 

14.11 

0 

0.00 

0 

100 

2 

16.17 

27.28 

0 

15.88 

0 

2.61 

19.12 

0 

0.00 

0 

2524 

15.91 

26.18 

1 

15.50 

1 

2.57 

15.70 

3 

0.00 

0 

is  successful  on  this  class  of  instances  is  that  it  is  the  only  algorithm  that  can  move  the  whole  set  G\ 
of  agents  and  jobs  to  the  second  stage  as  this  set  of  agents  and  jobs  does  not  satisfy  the  necessary 
optimality  condition.  It  takes  only  5  seconds  for  EGA  to  find  a  very  good  solution  to  the  largest 
problem  instance,  whereas  CPLEX  is  considerably  slower. 

Table  8  reports  results  for  the  interleaved-like  instances.  This  time  EGA-II  is  the  best  of  all 
the  algorithms  considered.  Its  solution  is  only  about  0.3%  worse  than  the  CPLEX  solution.  Since 
EGA-II  is  already  very  successful,  we  do  not  expect  much  improvement  from  the  local  search  heuris¬ 
tic.  It  is  also  interesting  to  notice  that  contrary  to  the  case  for  EGA,  the  first-stage  solution  is  better 
than  the  second-stage  solution  for  GA  and  GAE.  In  terms  of  time  requirements.  EGA  requires  at  most 
3  seconds  for  all  test  instances. 

In  summary,  we  should  point  out  that  the  results  for  the  pairwise- correlated  as  well  as  split-like 
and  interleaved-like  instances  indicate  that  if  the  weight  dependencies  between  subsets  of  agents 
and  jobs  become  more  complicated  (e.g.,  in  comparison  with  the  correlated  instances)  then  GA  and 
GAE  algorithms  fail  to  correctly  identify  the  proper  assignments  between  such  subsets  of  agents  and 
jobs.  This  is  due  to  the  fact  that  these  algorithms  check  the  necessary  optimality  conditions  only 
for  subsets  of  size  1  and  n.  On  the  other  hand,  EGA  is  specifically  designed  to  locate  some  of  such 
subsets  of  agents  and  jobs,  thus  significantly  improving  the  quality  of  the  obtained  solutions. 

6.6  Concluding  Remarks 

In  this  chapter  we  discuss  several  greedy  approximation  algorithms  for  the  2SSLA  problem.  The 
proposed  necessary  optimality  condition  unifies  two  recent  greedy  approximation  algorithms  from 
the  literature,  and  aid  in  the  development  of  a  more  advanced  approach.  While  EGA  preserves 
the  approximation  guarantees  of  GAE,  we  are  not  able  to  prove  whether  EGA  provides  a  better 
approximation  bound.  However,  analytical  observations  and  computational  results  indicate  that  EGA 
has  strictly  better  results  on  some  rather  broad  classes  of  the  two-stage  stochastic  linear  assignment 
problem. 

As  future  research  directions,  one  can  use  the  proposed  necessary  optimality  condition  to  develop 
new  algorithms  with  better  approximation  guarantees,  consider  the  extension  to  the  multi-stage 
stochastic  linear  assignment  problem,  or  concentrate  on  the  problems  with  stochastic  right-hand 
sides,  e.g.,  when  multiple  jobs  can  be  performed  by  the  same  agent.  Furthermore,  the  results  of 
the  reported  computational  experiments  indicate  that  the  integrality  gap  is  very  small  for  most  of 
the  considered  test  instances.  Thus,  development  of  approximation  algorithms  based  on  the  LP 
relaxation  of  the  original  integer  program  is  among  promising  research  directions.  We  are  also 
currently  working  extending  these  results  for  some  classes  of  nonlinear  assignment  problems. 
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Table  7:  Results  for  split-like  instances. 


Scenarios 

Agents 

Cplex  T 

GA-I 

GA-II 

GA  T 

GAE-I 

GAE  T 

EGA-I 

EGA-II 

EGA  T 

EGA-II  LS 

EGA-II  LS  T 

2 

10 

0 

22.01 

8.41 

0 

15.39 

0 

3.17 

1.75 

0 

0.85 

0 

20 

0 

17.23 

11.89 

0 

13.89 

0 

2.00 

3.66 

0 

1.03 

0 

50 

0 

14.41 

15.60 

0 

13.30 

0 

1.20 

6.93 

0 

1.04 

0 

100 

0 

13.63 

16.73 

0 

13.19 

0 

0.73 

6.22 

0 

0.78 

0 

200 

5 

12.73 

17.12 

0 

12.57 

0 

0.33 

6.71 

0 

0.51 

0 

3 

10 

0 

25.43 

4.01 

0 

21.20 

0 

2.67 

0.94 

0 

0.41 

0 

20 

0 

19.99 

10.12 

0 

18.75 

0 

1.12 

5.78 

0 

1.50 

0 

50 

0 

19.04 

14.62 

0 

18.85 

0 

0.71 

9.69 

0 

1.35 

0 

100 

1 

18.75 

16.22 

0 

18.71 

0 

0.33 

9.28 

0 

0.98 

0 

200 

27 

18.32 

16.84 

0 

18.32 

0 

0.18 

7.94 

0 

0.51 

0 

5 

20 

0 

23.33 

9.29 

0 

22.70 

0 

0.37 

7.81 

0 

2.10 

0 

50 

0 

23.27 

14.43 

0 

23.21 

0 

0.43 

11.84 

0 

1.73 

0 

100 

5 

23.38 

16.19 

0 

23.37 

0 

0.22 

11.02 

0 

0.77 

0 

200 

860 

22.91 

16.79 

0 

22.91 

0 

0.11 

10.18 

1 

0.48 

0 

10 

50 

2 

26.98 

14.19 

0 

26.97 

0 

0.25 

13.50 

0 

0.58 

0 

100 

120 

26.77 

16.10 

0 

26.77 

0 

0.07 

13.97 

0 

0.41 

0 

200 

3600 

26.41 

16.80 

0 

26.41 

0 

0.11 

12.86 

2 

0.34 

0 

20 

100 

2892 

28.52 

16.20 

0 

28.52 

0 

0.16 

15.84 

0 

0.45 

0 

200 

3600 

28.18 

16.77 

1 

28.18 

1 

0.10 

15.38 

5 

0.27 

0 

Table  8:  Results  for  interleaved-like  instances. 


Scenarios 

Agents 

Cplex  T 

GA-I 

GA-II 

GA  T 

GAE-I 

GAE  T 

EGA-I 

EGA-II 

EGA  T 

EGA-II  LS 

EGA-II  LS  T 

2 

10 

0 

6.55 

30.21 

0 

5.67 

0 

5.67 

0.57 

0 

0.30 

0 

20 

0 

6.44 

29.45 

0 

5.68 

0 

5.68 

0.69 

0 

0.50 

0 

50 

0 

5.92 

29.57 

0 

5.76 

0 

5.75 

0.32 

0 

0.28 

0 

100 

0 

5.77 

29.35 

0 

5.70 

0 

5.70 

0.17 

0 

0.17 

0 

200 

3 

5.62 

29.30 

0 

5.60 

0 

5.60 

0.09 

0 

0.09 

0 

3 

10 

0 

9.66 

30.69 

0 

8.59 

0 

8.53 

0.86 

0 

0.22 

0 

20 

0 

9.01 

29.31 

0 

8.40 

0 

8.32 

0.81 

0 

0.62 

0 

50 

0 

8.57 

28.99 

0 

8.45 

0 

8.42 

0.28 

0 

0.26 

0 

100 

1 

8.34 

29.24 

0 

8.32 

0 

8.31 

0.19 

0 

0.18 

0 

200 

12 

8.20 

29.36 

0 

8.20 

0 

8.19 

0.09 

0 

0.09 

0 

5 

20 

0 

10.72 

29.29 

0 

10.36 

0 

10.36 

0.69 

0 

0.54 

0 

50 

0 

10.35 

29.51 

0 

10.32 

0 

10.24 

0.21 

0 

0.21 

0 

100 

4 

10.38 

29.43 

0 

10.37 

0 

10.33 

0.14 

0 

0.14 

0 

200 

984 

10.24 

29.40 

0 

10.24 

0 

10.22 

0.06 

0 

0.06 

0 

10 

50 

1 

12.01 

28.75 

0 

11.99 

0 

11.90 

0.14 

0 

0.14 

0 

100 

65 

11.94 

29.11 

0 

11.94 

0 

11.90 

0.09 

0 

0.09 

0 

200 

3600 

11.83 

29.19 

0 

11.83 

0 

11.82 

0.07 

1 

0.07 

0 

20 

100 

3273 

12.75 

28.96 

0 

12.75 

0 

12.74 

0.11 

0 

0.11 

0 

200 

3600 

12.62 

29.12 

1 

12.62 

1 

12.61 

0.06 

3 

0.06 

0 

7  Multiple-Ratio  Fractional  Programming  Problems 

This  chapter  is  mostly  based  on  the  results  from: 

•  O.  Ursulenko,  S.  Butenko,  O.A.  Prokopyev,  “A  Global  Optimization  Algorithm  for  Solving 
the  Minimum  Multiple  Ratio  Spanning  Tree  Problem,”  Technical  report,  2010. 


7.1  Introduction 


A  fractional  combinatorial  optimization  problem  is  defined  as  follows: 


/(x 

ff(x) 


(188) 


where  A  C  {0, 1}P  is  a  set  of  certain  combinatorial  structures,  and  /  and  g  are  real-valued  functions 
defined  on  A.  In  addition,  it  is  common  to  assume  that  g(x)  >  0  for  all  x  £  X  [129], 

One  of  the  classical  fractional  combinatorial  optimization  problems  is  the  minimum  ratio  span¬ 
ning  tree  (MRST)  problem  [35],  which  is  defined  as  follows.  Consider  a  graph  G  =  (V,E)  with 
the  set  V  of  n  vertices  and  the  set  E  of  rn  edges.  Given  a  pair  of  numbers  ( aij,bij )  for  each  edge 
(i,j)  £  E,  find  a  spanning  tree  t* .  which  solves 


E 


mm 


(ij')Gr 


(189) 


where  T  denotes  the  set  of  all  spanning  trees  of  G. 

The  practical  applications  of  this  problem  include  the  minimal  cost-reliability  ratio  spanning 
tree  problem  [36],  where  the  functions  in  the  numerator  and  the  denominator  of  (189)  represent 
the  cost  and  the  reliability  of  the  spanning  tree  r  £  T,  respectively.  This  problem  can  be  solved 
in  polynomial  time  using  0(\E\5^2  log  log  |V|)  arithmetic  operations  [36,  37,  85].  Closely  related 
classes  of  problems,  where  A  is  a  cycle,  a  path,  or  a  cut  in  graph  G  also  admit  polynomial  time 
solution  approaches  [8,  105,  128,  129].  An  example  of  such  a  problem  is  the  minimum  cost-to-time 
ratio  cycle  problem,  also  known  as  the  tramp  steamer  problem  [8].  A  short  survey  on  fractional 
combinatorial  optimization  problems  and  related  solution  approaches  can  be  found  in  [129]. 

Recently,  Skiscim  and  Palocsay  [140,  141]  have  considered  a  generalization  of  the  MRST  problem, 
where  the  objective  function  is  given  by  the  sum  of  two  ratios.  The  resulting  two  ratio  minimum 
spanning  tree  (TRMST)  problem  is  defined  as  follows.  Consider  a  graph  G  =  (V,  E)  with  the  set  V 
of  n  vertices  and  the  set  E  of  rn  edges.  Given  a  set  of  4  real  positive  numbers  (alJ)  bVJ.  ct],  dtj)  for 
each  edge  (i,  j)  £  E,  find  a  spanning  tree  r*,  which  solves 


E(v 
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where  T  denotes  the  set  of  all  spanning  trees  of  G. 

A  closely  related  class  of  combinatorial  optimization  problems  is  optimization  of  the  ratio  of  two 
linear  0-1  functions: 


max 

xG{0,1}” 


/(X) 
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bo  +  EEl  kxi ' 


(191) 


This  problem  is  a  special  case  of  (188)  and  is  usually  referred  to  as  a  single-ratio  hyperbolic  0-1 
programming  problem  or  single-ratio  fractional  0-1  programming  problem  [25].  In  a  generalization 
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of  this  problem  one  considers  the  sum  of  ratios  of  linear  0-1  functions  in  the  objective: 


max 

x€{0,l}n 


k 

/w  =  E 


r= 1 


®r0  “1“  TVl  &ri%i 
brO  “1“  £t--i  briXi 


(192) 


This  problem  is  known  as  the  multiple-ratio  hyperbolic  (fractional)  0-1  programming  problem  [127, 
145].  A  short  survey  of  the  literature  dealing  with  the  fractional  0-1  programming  problems  can 
be  found  in  [125].  Applications  of  constrained  and  unconstrained  versions  of  these  problems  can  be 
found  in  service  systems  design  [51],  facility  location  [145],  query  optimization  in  data  bases  and 
information  retrieval  [77],  data  mining  [29],  etc. 

Both  the  minimum  ratio  spanning  tree  problem  and  the  single-ratio  hyperbolic  0-1  programming 
problem  are  polynomially  solvable  if  the  denominator  is  always  positive,  but  become  ./VP-hard  if 
the  denominator  can  take  both  positive  and  negative  values  [77,  126,  140].  On  the  other  hand,  their 
multiple-ratio  versions  (190)  and  (192)  are  IVP-hard  for  two  ratios,  even  if  all  denominators  are 
always  positive  [127,  140].  Some  other  complexity  aspects  of  unconstrained  single-  and  multiple- 
ratio  fractional  0-1  programming  problems,  including  complexity  of  uniqueness,  approximability 
and  local  search,  are  addressed  in  [126,  127]. 

Generally  speaking,  multiple-ratio  problems  appear  in  the  case  of  multiple  fractional  perfor¬ 
mance  metrics  that  need  to  be  optimized,  e.g.,  a  fleet  of  cargo  ships  in  the  tramp  steamer  problem. 
Related  discussion  can  be  found  in  [39,  135,  140,  141]  and  references  therein.  Analogously  with  the 
definition  of  the  multiple-ratio  hyperbolic  0-1  programming  problem,  the  multiple-ratio  fractional 
combinatorial  optimization  (MRFCO)  problem  is  defined  as 


k 

min  > 
xGA" 

i= 1 


/r(x) 
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where  X  C  {0, 1}P  is  a  set  of  certain  combinatorial  structures,  and  fr  and  gr,  r  =  1  are 

real- valued  function  defined  on  X . 

Another  possible  application  of  MRFCO  problems  is  to  consider  the  original  single-ratio  prob¬ 
lem  in  a  stochastic  environment.  Suppose  the  input  data  can  be  described  by  a  discrete  number  of 
possible  scenarios  with  corresponding  probabilities  ps,  which  is  a  typical  assumption  in  stochastic 
optimization  literature  [23].  Assume  also  that  the  original  metric  is  given  by  fs  (x)  / gs  (x)  for  each 
scenario  s,  s  =  1, . . . ,  S.  Then  designing  combinatorial  structure  x  €  X  with  the  minimum  expected 
cost  reduces  to  the  MRFCO  problem: 


•  /s(x) 

mm  >  ps  — — - 

0s(x) 


(194) 


Obviously,  the  TRMST  problem  mentioned  above  is  a  simple  example  of  the  MRFCO  problem. 
Then  the  multiple-ratio  version  of  the  MRST  problem  is  formulated  as  follows.  Let  G  =  (V,  E)  be  a 
graph  with  the  set  V  of  n  vertices  and  the  set  E  of  m  edges.  Given  k  pairs  of  real  positive  numbers 
(ajj,bjj)j,  (afjjbfj),  ...,  (a^-,6^)  for  each  edge  (i,j)  G  E,  the  minimum  multiple-ratio  spanning  tree 
(MMRST)  problem  is  to  find  a  spanning  tree  r*,  which  solves 
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where  T  denotes  the  set  of  all  spanning  trees  of  G.  Note  that,  similarly  to  [141],  we  assume  that 
all  the  coefficients  in  the  pairs  (ajj , 6  h ) ,  (a^-,6?  ),  ...,  (a^-, bG)  are  positive  for  each  arc  (i,j)  £  A. 
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The  MMRST  problem  may  naturally  arise,  e.g.,  in  applications  where  one  is  looking  for  an  optimal 
connected  configuration  in  a  network  that  serves  k  users,  r  =  1 , ,k,  each  of  which  has  its  own 
set  of  “cost”  and  “benefit”  pairs  {(a^-,6^)  :  (i,j)  €  E}  associated  with  edges  in  E. 

In  this  chapter  we  develop  a  global  branch-and-bound  approach  for  solving  the  MMRST  problem 
based  on  representing  the  problem  in  the  image  space  pioneered  by  Falk  and  Palocsay  for  general 
fractional  programming  [53,  54],  The  suggested  algorithm  has  evolved  from  the  ideas  behind  the 
work  on  two-ratio  minimum  spanning  trees  by  Skiscim  and  Palocsay  [140]. 

The  image  space  of  the  feasible  set  T  [54]  is  obtained  via  introducing  a  mapping  M  :  T  — >  M.k, 
such  that 


/ a'fx  a^x  a^x\ 

’  b^x  ’  ’  b^x  ) 


€  T 


(196) 


The  idea  of  the  image  space  became  popular  in  research  related  to  solving  the  problems  involving  the 
sum  of  ratios.  One  reason  is  that  using  the  image  space  may  significantly  reduce  the  computational 
burden  when  k  «  n,  which  is  usually  the  case  in  practical  applications.  This  especially  applies  to 
our  case,  since  for  combinatorial  problems  like  MST  the  dimension  of  the  original  feasible  region  is 
often  extremely  large.  Another  reason  is  that,  when  translated  to  Mfc,  the  MMRST  problem  (195) 
is  equivalent  to  the  linear  program 


min  eTy 

subject  to  y  €  co7iv(M(T)), 


(197) 


where  e  denotes  the  corresponding  vector  of  all  ones.  Unfortunately,  neither  we  have  a  description 
of  conv(M (T))  nor  there  exists  a  systematic  way  of  generating  its  facets  or  extreme  points.  It  may 
be  possible,  however,  to  build  a  sort  of  an  approximation  of  conv(Y),  which  would  be  accurate 
enough  in  the  neighborhood  of  an  optimal  extreme  point  y*  to  guarantee  a  solution  as  close  to  y* 
as  needed.  This  is  precisely  the  idea  our  algorithm  is  based  on. 

The  rest  of  this  chapter  is  organized  as  follows.  Section  7.2  provides  a  detailed  description  of 
the  developed  global  optimization  algorithm  and  the  proof  of  its  convergence.  The  computational 
results  are  discussed  in  Section  7.3.  Section  7.4  outlines  some  directions  for  future  research.  For 
graph  theory  definitions  used  in  the  chapter  and  for  a  recent  detailed  bibliography  of  fractional 
programming  we  refer  the  reader  to  [8]  and  [143],  respectively. 


7.2  A  Global  Optimization  Approach 

This  section  develops  a  global  optimization  approach  for  the  MMRST  problem.  Its  first  subsection 
provides  the  description  of  the  proposed  algorithm  and  establishes  convergence,  while  the  remaining 
two  subsections  address  two  important  aspects  of  the  algorithm,  namely,  solving  a  subproblem  and 
partitioning  the  feasible  region,  respectively. 


7.2.1  Description  and  convergence  of  the  main  algorithm 

In  order  to  proceed  with  description  of  the  algorithm,  let  us  introduce  some  additional  notation 
that  we  use  throughout  the  rest  of  the  chapter.  Recall  the  definition  of  the  image  Y  =  M(T )  of 
the  feasible  set  T  of  the  MMRST  problem  introduced  in  (196),  where  M  :  T  — >  is  given  by 

M(x)  =  . . . ,  for  any  x  €  T.  Given  x  G  T,  we  will  denote  by  Mr(x)  the  r-th  ratio 

ajx/bjx.  Given  y  €  Y,  we  will  denote  by  Af_1(y)  the  inverse  image  {x  £  T  :  M(x)  =  y}  .  Note  that 
since  T  is  finite,  Y  is  also  finite.  For  a  rectangular  region  Q  =  {y  €  :  lr  <  yr  <  ur,  r  =  1, . . . ,  A’}, 
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we  denote  the  vector  /  =  (l\,  ■ . .  ,h)  by  L(Q),  and  its  r-th  component  by  Lr(Q).  Similarly,  U(Q) 
and  Ur(Q )  denote  u  and  ur ,  respectively. 

On  each  step  j  of  our  algorithm,  an  approximation  of  the  portion  of  conv{Y)  containing  optimal 
solution  y*  is  given  by  a  set  of  rectangular  regions  S'-7  =  {Q\,  .  .  . ,  QJt},  such  that  for  all  steps  j  and 
i,  where  j  <  i,  we  have 

!•*/*€  U  Q\ 

Q&Si 

2.  U  QC  |J  Q; 

QeS*  QeSJ 

3-  €  U  Q  and  yl  €  (J  Q  are  available  s.t.  eTy*  <  eTyl  <  eTyP 

Q&Si  QeSi 

Note  that  eTL(Qp)  provides  a  lower  bound  on  the  optimal  objective  of  (197)  over  the  rectangle 
Qp.  Without  loss  of  generality,  we  can  assume  that  on  every  step  j  the  rectangular  regions  in 
the  set  S'-7  are  sorted  in  the  nondecreasing  order  of  such  lower  bounds,  i.e. ,  we  have  eT L(Qp)  < 
eT L(Qq)  \/p  <  q.  Then  eT L{Q\)  provides  the  lower  bound  on  (197)  available  from  the  approximation 
S-7.  Let  us  denote  this  lower  bound  by  z? ,  and  the  current  upper  bound,  which  is  the  best  feasible 
solution  found  so  far,  by  z.  S-7"1-1  is  obtained  from  S-7  by  reducing  Q\  and/or  partitioning  it  into 
two  subregions.  The  reduction  is  done  similarly  to  [140],  by  solving  the  following  subproblem  for  a 
particular  ratio  r  G  {1 , ,k}: 

min{yr  :  y  G  Y  (~l  Q{,  ys  <  us,  s  =  1, . ... ,  k}  (198) 

where 

us  =  max{y.5  :  y  G  Q{,  eTy  <  z}  =  z  -  zj  +  LS(Q{),  s  =  1, . . . ,  k. 

Let  y  be  an  optimal  solution  to  (198).  Then  Q\  may  be  reduced  to 

P  =  {y  <E  Q{  :  ys  <  us,  s  =  1, . . . ,  k,  s  ^  r,  yr  >  yr} 

without  discarding  any  y  G  Y  (~l  Q\  that  are  no  worse  than  the  best  incumbent  solution  to  (197). 
If  yr  >  Lr(Q\),  then  z?+1  will  be  a  better  lower  bound  than  z-7.  Certainly,  P  is  discarded  from 
further  consideration  if  eT L(P)  >  z.  Otherwise,  y  may  improve  on  the  current  incumbent  solution. 
If  yr  =  Lr(QJ1),  then  P  is  partitioned  into  P'  =  {y  G  P  :  yh  <  (Lh{P)  +  yh)/ 2}  and  P"  =  {y  G  P  : 
Vh  >  ( Lh{P )  +Vh)/ 2},  where 


h  =  arg  max{|ys  -  LS(P)\  :s  =  l,...,k}.  (199) 

Thus  y  becomes  separated  from  L(P'),  making  the  next  iteration  likely  to  improve  z.  Of  course, 
(198)  does  not  have  to  be  solved  when  y  £  P  such  that  yr  =  Lr(Q\)  is  already  known  from  previous 
steps  of  the  algorithm. 

The  formal  description  of  the  algorithm  is  provided  in  Algorithm  3.  Note  that  T  is  passed  to 
the  main  procedure  implicitly  through  the  description  of  graph  G.  We  discuss  how  the  subproblems 
(198)  are  solved  along  with  some  other  important  details  in  the  following  subsections. 

Theorem  7  Algorithm  3  converges  in  a  finite  number  of  steps. 
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Algorithm  3: 

Require:  G;  ar ,  br  G  Rn,  r  =  1, . . . ,  k;  0  <  e  <  1. 

Ensure:  x ,  an  e-optimal  solution  to  (195). 

1:  yr  G-  arg niin{yr  :  y  G  1'}, r  =  1, ...  ,k\ 

2:  y  g-  arg  min{eTy  :  y  G  {y1, . . . ,  yk}  }; 

3:  z  G-  eTy; 

4:  QG{i/Glfc:?/r>j/rr,r  =  l,...,  A:}; 

5:  5^{Q}; 

6:  Choose  r  G  {1, . . .  ,  A:}; 

7:  repeat 

8:  Q  G-  the  first  set  in  S'; 

9:  Remove  Q  from  S; 

10:  z  i —  eTL(Q); 

11:  P  G-  {y  G  Q  :  ys  <  Z  -  z  +  Ls(Q),s  =  l,...,k}; 

12:  y  =  arg  min{yr  :j/Ghn  P}\ 

13:  if  eTy  <  z  then 

14:  z  G-  eTy; 

15:  y  G-  y; 

16:  end  if 

17:  if  yr  =  Lr(P)  then 

18:  Choose  h  G  {1, . . . ,  k}(h  ^  r)  that  maximizes  i)h  —  Lh(P)\ 

19:  P'  G-  {y  G  P  :  yh  <  ( Lh{P )  +  yh)/ 2}; 

20:  P"  G  {j;  G  P  :  j/i,  >  (Lh(P)  +  yft)/2}; 

21:  if  eT L(P")  <  z  then 

22:  5g5U{P"}; 

23:  end  if 

24:  else 

25:  P  i  {y  G  P  .  yr  ^  yr}j 

26:  end  if 

27:  if  eT L(P')  <  z  then 

28:  5GSU  {P'}; 

29:  end  if 

30:  until  z  —  z<  ez 

31:  return  x  G  M-1(y); 


Proof:  Let  z 3  denote  the  value  of  z  after  j  steps  of  the  algorithm.  The  algorithm  terminates  when 
zJ  —  z?  <  ez.  Suppose  that  the  stopping  criterion  is  not  satished  in  a  finite  number  of  steps,  i.e.,  the 
algorithm  generates  infinite  sequences  of  bounds  {zJ  :  j  >  1}  and  {zJ  :  j  >  1}.  Since  {zJ  :  j  >  1 } 
is  monotonously  nondecreasing,  {P  :  j  >  1}  is  monotonously  nonincreasing,  and  z?  <  z3  for  any 
j  >  1,  both  sequences  must  converge: 


lim  z?  =  z_* ,  lim  z 3  =  z* ,  and  z*  <  z*. 

j~>00  j—t'OO 

The  last  inequality  is  strict  because  of  the  assumption  that  we  do  not  have  a  finite  convergence  of 
the  algorithm.  Consider  an  arbitrary  5  >  0.  We  will  show  that  there  exists  j  such  that  for  any 
j  >  j  :  z3  —  z3  <  5 ,  thus  obtaining  a  contradiction. 


Note  that  finiteness  of  Y  guarantees  that  z  improves  after  a  finite  number  of  steps,  and  the 
lower  bound  can  increase  only  due  to  one  of  the  following  two  reasons: 

1.  yr  >  Lr(P),  in  which  case  the  lower  bound  increases  by  yr  —  Lr(P ); 

2.  P'  is  not  added  to  S,  i.e.,  eT  L(P')  >  z,  in  which  case  the  increase  in  lower  bound  value  would 
be  (yh  -  Lh(P))/2. 

Due  to  finiteness  of  Y  it  is  possible  to  choose  <5i  <  min{|y'.  —  y" \  :  y',y"  £  Y,y'r  ^  y”}. 
82  <  min{(5i,  5 /(2k)},  and  due  to  convergence  of  { z /  :  j  >  1 }  there  exists  j  such  that  for  any  j  >  j 
we  have  | z?  —  z/+1  |  <  62-  On  the  other  hand,  if  z /  in  the  algorithm  increases  because  yr  >  Lr(P ) 
then  the  increase  must  be  at  least  82-  Thus,  if  j  >  j,  z /  can  increase  only  due  to  the  second  reason, 
and  the  corresponding  increase  (yh  —  Lh(P))/ 2,  where  h  is  defined  in  line  18  of  Algorithm  3,  must 
be  less  than  82 ■  Since  h  maximizes  ys  —  LS(P ),  s  =  1, . . . ,  k  and  yh  is  a  feasible  solution,  we  have 

z^  —  z /  <  eTy  —  eT L(P)  <  k(t)h  ~  ^h(P))  <  2M2  <  8. 

Thus,  z*  =  z*,  and  we  obtain  the  contradiction  with  our  assumption  that  the  stopping  criterion  is 
not  satisfied  in  a  finite  number  of  steps.  The  finite  convergence  follows.  □ 

7.2.2  Solving  the  subproblem 

Computational  complexity  of  each  iteration  of  the  algorithm  described  above  is  defined  by  the  com¬ 
plexity  of  solving  the  subproblem  (198),  therefore  it  is  imperative  to  solve  this  problem  effectively. 
Returning  to  the  original  variable  x ,  for  a  rectangular  region  Q  £  it  is  formulated  as 


min  ajx/b^x  (200) 

subject  to  x  £  T  D  £>,  (200a) 

where  B  defines  Q  in  terms  of  x: 

(di  —  Ui(Q)bi)Tx  <  0,  i  =  l,...,k]  (200b) 

(Li(Q)bi  —  ai)T x  <0,  i  =  1, . . . ,  k.  (200c) 


The  constrained,  minimum  ratio  spanning  tree  (CMRST)  problem  (200)  above  is  a  generalization 
of  the  capacity-constrained  version  of  the  MST  problem.  Unfortunately,  the  latter  problem  is  NP- 
hard,  as  shown  by  Aggarwal  et  al.  [6],  even  in  the  case  of  one  constraint.  Unless  we  specifically 
mention  otherwise,  Lr(Q )  and  Ur(Q)  in  (200b)-(200c)  should  be  assumed  —00  and  00,  respectively, 
i.e.,  k  =  2  refers  to  a  single  constraint  case  of  (200). 

An  effective  branch-and-bound  approach  is  suggested  in  [6]  for  the  MST  problem  with  a  single 
capacity  constraint.  This  approach  can  be  directly  extended  to  our  problem  when  k  =  2,  but 
because  it  heavily  exploits  the  ability  to  obtain  solutions  that  satisfy  the  capacity  constraint,  further 
generalization  for  k  >  2  is  difficult,  if  at  all  possible.  In  fact,  the  case  of  multiple  capacity  constraints 
in  such  classical  combinatorial  optimization  problems  as  the  MST  problem  and  the  shortest  path 
problem  is  not  addressed  in  the  literature.  Therefore,  we  have  developed  our  own  branch-and-bound 
approach  for  solving  the  general  CMRST  problem  when  k  >  2. 

Each  node  M  of  our  branch-and-bound  tree  is  characterized  by  the  sets  F°M  =  {e£  E(G)  : 
xe  is  fixed  to  0}  and  F/f  =  {e  £  E(G )  :  xe  is  fixed  to  1}.  To  obtain  a  good  lower  bound  on 
the  objective  in  each  node,  we  dualize  the  constraints  (200b)  and  (200c)  and  solve  the  fractional 
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Lagrangian  dual  introduced  by  Gol’stein  in  [71].  Assume,  without  loss  of  generality,  that  r  =  1  in 
(200).  Then  the  respective  Lagrangian  dual  problem  is  defined  as 

maxmin£(s,r),  (201) 

tJ>0 


where  v  € 


r>2k—2 


and 


C(x,v)  =  max  min 

v>0  xeT 


afx 

bjx 


£ 

r—2 


vr-i  (ar  —  Ur(Q)br)T  x 
bfx 


£ 

r—2 


’Cfc+r— 2  (Lr  (Q^bf  Cr)  X 

bfx 


(202) 


We  can  solve  (201),  e.g.,  via  some  subgradient  optimization  algorithm  [108].  We  employ  the  Kelley’s 
cutting  plane  method,  since  for  moderate  k  it  converges  fast  for  piecewise  linear  functions  in  practice. 
Each  new  cutting  plane  generated  by  the  Kelley’s  method  corresponds  to  a  tree  x  £  T,  which  may 
or  may  not  be  feasible  for  our  CMRST  problem.  Depending  on  whether  this  is  the  case,  they  will 
be  used  differently  in  computing  a  lower  bound  for  the  CMRST  problem. 

We  adopt  a  branching  rule  similar  to  the  one  introduced  in  [6].  Suppose  that  solving  the  dual 
problem  in  node  J\f  yields  a  solution  x  £  T  feasible  to  (200),  and  let  e\, . . .  ,ep  (p  <  m  —  1)  be 
all  edges  of  the  tree  corresponding  to  x  that  are  not  fixed  in  node  A f.  We  produce  p  child  nodes 
Adi, ... ,  Adp  of  Af  by  additionally  fixing  some  of  those  edges  at  each  child  node.  Specifically,  a  child 
node  Adj  (j  =  1 , ,p)  is  created  by  additionally  fixing  j  edges  out  of  e±, ...  ,ep  according  to  the 
rule: 


FMj  Fj\[  U  {ei,  e2,  •  •  • ,  &j— i}- 


(203) 


Note  that  if  Fj^  is  a  forest,  then  it  is  guaranteed  that  Fj^.  is  a  forest.  If  several  trees  feasible  to 
(200)  are  available  in  Af,  then  we  choose  a  tree  that  yields  the  best  objective  value. 

However,  it  is  possible  that  the  procedure  that  solves  (201)  does  not  encounter  a  solution  feasible 
to  (200).  Then  we  use  a  different  criterion  for  choosing  the  edges  to  branch  upon.  Let  v  be  the 
optimal  solution  to  (201),  and  the  trees  define  the  hyperplanes  that  are  tangent  to  the 

lower  epigraph  of  C(v )  =  min£(a;,  v)  at  v  for  some  w  >  1.  Since  epiC  C  M2^1,  to  define  v  uniquely 

xgT 

we  need  at  least  2k  — 1  hyperplanes.  Thus,  eliminating  w  —  2k  +  2  of  the  w  trees  guarantees  increase 
in  the  optimal  value  of  C(v).  Therefore,  the  branching  should  be  performed  on  the  edges  of  those 
particular  trees. 

However,  it  may  be  difficult  to  obtain  all  hyperplanes  tangent  to  the  lower  epigraph  of  C(y)  at 
v.  Instead,  we  branch  on  the  edges  of  the  trees  t±, . . .  ,ta,  corresponding  to  the  last  a  hyperplanes 
produced  by  Kelley’s  method  to  approximate  the  epigraph  of  C{y).  We  choose  p'  edges  occurring 
most  frequently  in  ti,...,ta,  that  are  not  yet  fixed,  and  produce  p'  child  nodes  according  to  the 
rule  (203) .  Clearly,  because  in  this  case  there  is  no  guarantee  for  a  child  node  M.  j  that  Fj^ ,  is  a 
forest,  we  have  to  check  this  fact,  and  discard  the  node  if  it  is  not. 

Solving  (201)  via  the  Kelley’s  method,  in  turn,  involves  solving  a  sequence  of  problems  of  the 
form 

min  aTx/bTx,  (204) 


which  is  polynomially  solvable.  We  solve  (204)  using  the  Dinkelbach’s  method  [50],  which,  again, 
involves  solving  a  sequence  of  MST  problems.  Consequently,  to  derive  a  lower  bound  for  (200),  we 
examine  a  sequence  of  spanning  trees  of  G,  each  of  them  being  a  feasible  solution  to  the  original 
MMRST  problem  (195).  Therefore,  as  we  obtain  each  spanning  tree,  we  examine  the  value  of  the 
original  objective  that  it  yields,  and  improve  the  upper  bound  z  whenever  possible. 
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7.2.3  Partitioning  the  feasible  region 


There  is  a  subtle,  yet  extremely  important  from  the  computational  perspective,  difference  between 
the  cases  k  =  2  and  k  >  2.  To  solve  (197)  for  k  =  2,  one  can  take  advantage  of  the  fact  that 
alternating  r  =  1  and  r  =  2  in 


min  yr  (205) 

subject  to  Li(Q)  <yi<  Ui(Q),  i  /  r  (205a) 

y  £  Y  (205b) 

virtually  rules  out  the  necessity  to  partition  Qcl2.  Note  also  that  Li(Q)  may  be  set  to  — oo,  and 
thus  efficient  procedures  suggested  in  [6,  76]  for  solving  (205)  with  only  one  side  constraint  may  be 
utilized.  In  fact,  this  is  the  strategy  used  in  the  algorithm  by  Skiscim  and  Palocsay  [140].  Indeed, 
suppose  that  y1  is  the  solution  to  (205)  for  r  =  1  and 

Q  =  {y  £  M2  :  (h,h)  <  y  <  (ui,u2)}- 


Now  Q  is  reduced  to 

Q'  =  {y  e  IK2 ;  (y\,h)  <y<  ( eTy 1  -  h,y\)}- 

Let  y2  be  a  solution  to  (205)  for  r  =  2  and  Q  =  Q' . 

If  eTy2  <  eTyl,  then  we  can  further  reduce  Q'  to 

Q"  =  {y  €R2  :  (y\ .  y2)  <y<  {yl,eTy2  -  y{)} 

thus  forcing  y1  £  Q",  since  eTy 2  <  eTyl  =>■  eTy 2  —  y\  <  y\.  Now  that  y1  is  separated  from  L(Q"), 
we  can  again  solve  (205)  for  r  =  1  and  Q  =  Q"  to  further  improve  the  bounds  on  the  optimal 
objective  of  (197). 

If  eTy2  >  eTyl ,  then  we  can  reduce  Q'  to 

Q"'  =  {y  €  M2  :  (y\,  yl)<y  <  ( eTyl  -  y2,  y\)} 

forcing  y2  £  Q'" ,  and  we  can  proceed  with  solving  (205)  for  r  =  2  and  Q  =  Q'" . 

The  only  case  when  the  algorithm  cannot  proceed  is  eTy 2  =  e^1.  In  [140]  the  authors  restart 
the  procedure  by  improving  the  upper  bound,  thus  reducing  Q'  and  forcing  both  y1  and  y2  outside 
of  the  resulting  rectangle.  To  achieve  this,  either  a  local  search  is  performed,  or,  if  the  local  search 
fails  to  improve  an  incumbent  solution,  the  procedure  is  applied  recursively  to  {y  G  Q'  :  ys  < 
(LS(Q)  +  US(Q))/ 2},  s  =  1,  2  until  either  a  better  incumbent  is  found  or  e-optinrality  of  the  current 
incumbent  is  proved. 

Consider  now  the  case  k  >  2.  Let  Q  C  Mfc  such  that 

LS(Q)  =  ruin {y, s  €  M  :  y  €  Y  n  Q}, 

with  ys  being  the  respective  optimal  image  point,  s  =  1, . . . ,  k\  and 

k 

Us(Q)  =  z -  Lsi(Q),s  =  l,...,k, 

s’= 

where  z  =  miii{eTys,  s  =  1 , ,k}. 

It  is  likely  that  ys  £  Q'  for  all  s  =  1 ,k  when  k  >  2.  This  case  is  analogous  to  eTy2  =  eTy 1  for 
k  =  2  above,  and  the  procedure  by  Sciskim  and  Palocsay  [140]  outlined  above  stalls.  Improvement 
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Figure  13:  A  two-dimensional  illustration  of  condition  (206)  where  r  =  1. 


of  the  upper  bound,  unless  it  is  large  enough  (which  cannot  be  guaranteed),  does  not  restart 
the  procedure.  Therefore,  for  k  >  2  partitioning  Q  is  a  vital  step  for  the  algorithm  to  proceed. 
Moreover,  it  turns  out  that  the  way  the  feasible  region  is  partitioned  has  a  significant  impact  on 
the  computational  performance  of  the  algorithm.  In  particular,  we  would  like  to  avoid  solving  (200) 
with  finite  Lr(Q).  Suppose  such  subproblem  may  have  to  be  solved  and 

3  O',  Q"  €  S  :  Lr(Q')  >  Ur{Q"),  LS{Q')  <  LS{Q")  <  US{Q')  for  some  s  +  r,  (206) 

i.e. ,  the  regions  Q'  and  Q"  are  positioned  as  shown  on  Figure  13  with  r  =  1  and  s  =  2. 

As  the  following  proposition  implies,  the  situation  described  by  the  condition  (206)  may  lead 
to  extremely  inefficient  computations.  Assume  that  B  is  defined  as  in  (200b)-(200c),  Lr(Q )  >  — oo, 
and  C(x,v)  is  the  fractional  Lagrangian  function  of  (200)  obtained  via  dualizing  the  constraints 
defining  B.  Then  the  following  proposition  is  true. 

Proposition  10  Let  conv(T )  08/0,  and  x  G  T  be  such  that  ajx/bjx  <  Lr(Q),  and  all  other 
inequalities  that  define  B  are  satisfied  in  x.  Then 

sup  inf  £(x,v)  =  Lr(Q). 

v>0 X^T 


Proof:  Take  some  x  €  conv(T )  D  B.  Let  x  =  ax  +  (1  —  a)x  for  some  0  <  a  <  1  such  that 
ajx/b^x  =  Lr{Q).  Since  B  is  convex,  such  x  exists.  Moreover,  it  is  an  optimal  solution  to  the 
linear  relaxation  of  (200) 

min  ajx/b^x  (207) 

subject  to  x  £  conv(T)  D  B.  (207a) 
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Indeed,  x  £  B  enforces  the  lower  bound  of  Lr(Q )  on  the  objective,  and  this  bound  is  achieved  at  x. 
On  the  other  hand,  it  follows  from  the  results  in  fractional  duality  [24,  71,  134]  that 

Lr(Q )  =  inf {a^x/bjx  :  x  £  conv(T )  D  £>}  =  supinf{£(x,  v)  :  x  £  conv(T)}. 

v>0 

Since  C(x,v)  is  quasiconcave  for  any  fixed  v  >  0,  it  achieves  its  minimum  over  conv{T )  in  some  x* 
that  is  a  vertex  of  conv(T).  Thus  x*  £  T  and 

sup inf{£(x,  v)  :  x  £  T}  =  supinf{£(x,  x)  :  x  £  conv(T)}  =  Lr(Q). 

t;>0  i;>0 

□ 

Suppose  that  Q' ,  Q"  are  defined  as  in  (206),  and  T  C  T  is  such  that 

Vx  £  T  Mr(x)  £  Q"  and  LS{Q')  <  Ms(x)  <  US(Q s  =  1, . . . ,  k,  s  ^  r. 

The  example  displayed  on  Figure  13  shows  M{T )  as  the  image  points  encircled  by  a  dash  line. 
Then,  if  the  CMRST  subproblem  (200)  with  the  box  constraints  defined  by  Q'  is  solved  via  the 
procedure  described  in  subsection  7.2.2,  the  lower  bound  on  the  optimal  value  of  (200)  obtained  in 
all  nodes  of  the  branch-and-bound  tree  will  be  equal  to  Lr(Q')  until  at  least  one  edge  is  excluded 
for  each  x  £  T.  Not  only  this  may  be  a  weak  bound ;  what  is  worse,  it  leaves  the  branching  process 
without  direction  for  choosing  the  next  node  to  process,  thus  dramatically  increasing  run  time.  To 
rule  out  the  possibility  of  such  a  situation  to  occur,  we  do  not  alternate  the  index  r,  but  choose  it 
to  be  fixed  in  Algorithm  3.  This  way,  the  boxes  can  only  be  split  by  hyperplanes  that  are  parallel 
to  the  r-th  coordinate  axis.  Hence,  since  we  start  with  a  single  box,  the  projections  of  boxes  in  SJ 
at  any  step  j  of  the  algorithm  onto  any  coordinate  axes  other  than  r-tli  never  overlap. 

It  should  be  clear,  that  the  run  time  of  the  main  algorithm  does  depend  on  the  choice  of  r,  as 
it  depends  on  the  shape  of  conviY).  It  may  be  chosen,  for  example,  by  running  a  few  iterations  of 
the  algorithm  for  every  r  =  1, . . . ,  k,  and  choosing  the  ratio  along  which  the  lower  bound  progresses 
faster. 

7.3  Computational  Experiments 

7.3.1  Setup 

All  algorithms  are  implemented  in  C++  using  Microsoft  Visual  Studio  2003  environment. We  rely 
on  the  Boost  Graph  Library  [139]  implementation  of  adjacency  list  to  represent  graphs,  and  the 
Mersenne  Twister  MT19937  [103]  random  number  generator  implementation  from  the  Boost  Ran¬ 
dom  Number  Library  [104],  The  experiments  were  performed  on  a  computer  with  Intel®  Core™  2 
Duo  3.16  GHz  CPU  and  3.23  GB  of  RAM. 

The  computational  experiments  were  carried  out  for  k  =  1, . . . ,  5.  We  considered  two  types 
of  test  instances:  complete  graphs  and  connected  random  graphs.  For  all  graphs  the  parameters 
ajj, ... ,  affj .  b\p  . . . ,  Vfj  for  each  edge  (i,j)  of  the  graph  are  uncorrelated  and  follow  standard  uniform 
distribution. 

7.3.2  Results  and  Discussion 

Tables  9  and  10  summarize  the  computational  performance  of  the  developed  global  optimization 
algorithm  for  complete  graph  instances  and  sparse  connected  random  graphs,  respectively.  Proba¬ 
bility  of  an  edge  in  the  latter  type  of  instances  is  set  to  0.1  when  |V|  =  20,  and  to  0.05  otherwise. 
We  tested  a  batch  of  five  instances  for  each  reported  pair  ( k,n ).  The  target  gap  value  is  set  to  1%, 
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Table  9:  Performance  on  complete  graph  instances 


k 

n 

steps 

run  time 
(sec.) 

gap 

(%) 

10 

12.4 

0.1 

0.6 

15 

20.0 

0.6 

0.9 

20 

26.0 

2.6 

0.9 

O 

30 

37.0 

16.8 

0.9 

Z 

40 

28.6 

37.0 

0.9 

50 

39.0 

87.0 

0.9 

80 

65.4 

1053.0 

1.0 

100 

65.8 

2941.0 

3.9 

10 

51.0 

1.1 

0.9 

15 

104.2 

14.6 

0.9 

Q 

20 

190.4 

108.0 

1.0 

O 

30 

495.6 

1801.0 

1.0 

40 

322.0 

3465.0 

1.9 

50 

46.0 

3600.0 

16.2 

10 

318.0 

19.8 

0.9 

A 

15 

999.4 

441.0 

1.0 

4 

20 

1773.2 

3424.0 

1.9 

30 

198.6 

3600.0 

17.7 

10 

1534.0 

181.0 

0.9 

5 

15 

4101.2 

3600.0 

2.9 

20 

710.0 

3600.0 

8.9 

and  computation  time  is  limited  to  1  hour.  Average  run  time  as  well  as  average  final  gap  values  are 
reported  for  each  batch.  In  addition  to  run  time  and  gap,  we  report  the  average  number  of  steps 
(i.e.,  subproblems  solved)  performed  by  the  algorithm  in  order  to  reach  the  final  gap  value,  or  until 
the  allotted  time  expires. 

It  is  evident  that  performance  of  the  global  optimization  approach  depends  on  both  k  and  |T|, 
which  have  their  impact  on  how  difficult  it  is  to  build  the  approximation  of  conv{Y )  that  is  accurate 
enough.  Furthermore,  computational  complexity  of  each  iteration  also  depends  on  both  of  these 
factors.  As  expected,  the  results  suggest  that  k  primarily  affects  the  number  of  iterations,  and  that 
|T|  mostly  affects  the  time  per  iteration.  An  encouraging  empirical  conclusion  can  be  drawn  from 
Figure  14,  which  presents  convergence  of  bounds  on  the  optimal  objective  value  for  the  hardest 
tested  instances.  It  turns  out  that  an  optimal  or  near-optimal  solution  is  found  by  the  algorithm 
early,  and  most  of  the  time  is  spent  on  proving  quality  of  an  incumbent.  This  tendency  is  even 
more  obvious  for  easier  instances.  Most  likely  this  should  be  contributed  to  a  large  number  of  trees 
examined  on  each  step  of  the  algorithm.  Therefore,  when  the  size  of  the  instance  does  not  allow 
to  prove  near-optimality  in  a  reasonable  time,  the  suggested  algorithm  may  still  be  used  as  a  good 
heuristic. 

In  general,  the  developed  global  optimization  procedure  shows  consistently  good  performance 
on  the  instances  with  small  and  medium  graph  sizes  and  relatively  small  number  of  ratios  in  the 
objective  function.  However,  the  considered  approach  needs  substantial  improvement  in  order  to 
guarantee  a  near-optimal  solution  in  reasonable  time  for  large  scale  instances  and  large  number  of 
ratios  in  the  objective. 
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Table  10:  Performance  on  sparse  random  graph  instances 
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Figure  14:  Convergence  of  lower  and  upper  bounds,  represented  by  dashed  and  solid  lines,  respec¬ 
tively,  for  the  hardest  complete  graph  instances. 
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7.4  Concluding  Remarks 

We  have  proposed  a  global  optimization  algorithm  for  solving  the  MMRST  problem.  The  developed 
method  has  evolved  from  the  ideas  used  by  Skiscim  and  Palocsay  [140]  to  solve  a  special  case  of 
MMRST.  The  algorithm  approximates  the  convex  hull  of  the  images  of  the  spanning  trees  by 
subdividing  the  image  space  and  minimizing  a  single  ratio  in  the  obtained  subregions.  Extensive 
computational  tests  reveal  positive  sides  of  the  considered  approach,  as  well  as  its  limitations. 

Several  directions  of  the  further  research  seem  to  be  apparent  from  our  results.  Firstly,  the 
optimization  algorithm  can  be  adapted  to  other  combinatorial  problems  with  the  considered  form 
of  the  objective,  as  long  as  the  linear  version  of  the  problem  can  be  solved  rather  efficiently.  Secondly, 
instead  of  solving  the  constrained  minimum  ratio  spanning  tree  subproblem  to  optimality  on  every 
iteration,  it  may  be  beneficial  to  underestimate  its  optimal  value,  provided  that  we  can  guarantee 
convergence  of  the  bounds  for  such  method.  Also,  if  we  could  find  a  way  to  emphasize  proving  the 
quality  of  an  incumbent  solution  at  a  reasonable  computational  cost,  we  would  perhaps  significantly 
increase  overall  performance  of  the  global  optimization  procedure. 
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8  Irregular  Polyomino  Tiling  via  Integer  Programming 

This  chapter  is  mostly  based  on  the  results  from: 

•  S.  Karademir,  O.A.  Prokopyev,  “Irregular  Polyomino  Tiling  via  Integer  Programming,”  work¬ 
ing  paper,  2011. 

8.1  Introduction 

A  phased  array  antenna  is  composed  of  many  stationary  antenna  elements.  Phase  shift  and  time 
delay  are  the  key  ideas  behind  electronically  steering  the  beam  in  phased  array  antennas.  This 
technology  replaces  mechanically  steered  array  antenna  designs.  Details  on  theoretical  foundations 
of  antennas  and  review  of  current  antenna  technology  can  be  found  in  [56,  98,  122].  Ideally  one 
would  like  to  have  controls  (phase  shift  or  time  delay  depending  on  the  application  [99])  at  element 
level  but  it  is  too  expensive  to  implement  that  many  controls.  Therefore,  a  group  of  elements  are 
used  to  form  a  ‘subarray’  which  is  treated  and  controlled  as  an  oversized  element. 

Quantization  sidelobes’  occur  due  to  periodicity  introduced  by  identical  rectangular  subarrays 
used  in  practice.  Simply  speaking,  a  sidelobe  is  a  beam  (typically  of  a  smaller  magnitude)  with  a 
direction  other  than  the  main  beam  direction  of  the  antenna.  Such  undesired  radiation  reduces  the 
quality  of  the  pattern  generated  by  antenna.  It  has  been  shown  that  using  irregular  polyomino- 
shaped  subarrays  in  design  of  phased  array  antennas  results  in  a  significant  reduction  of  quantization 
lobes  [99,  100].  Figure  15  from  [100]  illustrates  that  when  polyominoes  are  used  as  subarray  shapes, 
sidelobe  quantization  is  reduced  to  white  noise  level  and  the  main  beam  is  substantially  more 
significant. 


Figure  15:  Comparison  of  3D  sidelobe  profiles  with  time  delay  at:  (a)  element  level,  (b)  rectangular 
subarray  level,  and  (c)  polyomino  shaped  subarray  level  [100]. 

In  combinatorial  geometry ,  a  polyomino  is  a  generalization  of  the  domino  and  is  created  by 
connecting  certain  numbers  of  equal-sized  squares  [26,  69].  Fig.  16  shows  the  first  five  families  of 
polyominoes.  The  number  of  different  polyominoes,  excluding  rotations  and  reflection,  increases 
very  fast  as  n,  the  number  of  squares  used,  grows.  There  are  2,  108,  and  63600  different  polyomi¬ 
noes  for  n  equal  to  3,  7,  and  12,  respectively.  Polyomino  tiling  is  computationally  difficult  since 
even  deciding  whether  a  rectangular  box  can  be  exactly  tiled  by  a  set  of  given  rectangles  is  VP- 
complete  [44].  Enumeration  is  the  only  technique  that  is  typically  applied  in  practice.  Furthermore, 
as  previously  stated,  using  ‘irregular’  polyomino  tilings  results  in  a  major  improvement  in  array 
antenna  performance  [99,  100].  Therefore,  another  challenge  that  we  face  is  to  define  a  proper 
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measure  of  ‘ irregularity ’  of  a  tiling.  We  tackle  these  problems  using  mixed  integer  programming 
approach  incorporating  the  concept  of  information-theoretic  entropy.  In  particular,  ‘irregular’  poly- 
omino  tiling  problem  can  be  modeled  as  an  entropy  maximizing  set  partitioning/covering  problem, 
which  can  be  formulated  as  a  nonlinear  mixed  integer  program  (MIP). 

This  chapter  briefly  reviews  our  current  progress  in  developing  the  solution  approaches  for 
solving  the  ‘irregular’  polyomino  tiling  problem.  In  particular,  the  remainder  of  this  chapter  is 
organized  as  follows.  In  Section  8.2  we  describe  our  metric  for  measuring  the  irregularity  of  a  tiling. 
Section  8.3  develops  respective  mathematical  programming  formulations.  Section  8.4  describes 
heuristic  procedures  that  can  be  used  to  solve  large-scale  tiling  problems.  Finally,  Section  8.5 
motivates  our  current  work. 


8.2  Information  Theoretic  Entropy  as  a  Measure  of  “Irregularity” 

We  need  a  metric  that  can  measure  the  irregularity  of  a  given  tiling.  Subsequently,  it  can  be 
incorporated  into  the  mathematical  programming  models  as  an  objective  function.  We  use  the 
Information  Theoretic  Entropy  concept.  Though  this  concept  has  found  applications  in  optimization 
and  graph  theory,  we  believe  our  work  is  the  first  to  use  it  as  a  measure  of  irregularity  in  the 
framework  of  the  polyomino  tiling  problem. 

Let  X  be  a  discrete  random  variable  with  n  outcomes.  Furthermore,  let  pt  be  the  probability 
of  the  ith  outcome.  Then  the  information  theoretic  entropy  of  X ,  H(X )  is  defined  as 

H{X)  =  ~Y^PilS(Pi)  ■ 

i 

If  pi  =  0,  the  value  of  the  term  pilogpi  is  assumed  to  be  zero,  which  is  consistent  with 
linip^o-i-  P  log  P  =  0.  It  is  easy  to  verify  that  H(X)  is  concave  and  its  maximum  occurs  when 
Pi  =  ^  for  all  i.  This  implies  that  the  entropy  attains  its  maximum  for  the  uniform  distribution. 
To  link  this  result  to  the  entropy  concept  from  the  statistical  mechanics,  consider  each  outcome  of 
X  to  be  a  micro-state  that  some  system  may  occupy.  The  less  we  know  about  the  micro-state  that 
the  system  occupies,  the  larger  is  the  entropy  of  the  system.  Then,  intuitively  if  each  state  may 
occur  with  equal  probability  then  our  knowledge  about  the  system  is  minimal;  hence,  its  entropy 
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is  maximized.  Conversely,  consider  a  random  variable  with  a  single  outcome  having  probability  1. 
Then  we  have  complete  knowledge  about  the  system  state;  thus,  its  entropy  is  0. 

Assume  each  polyomino  is  a  solid  body  with  some  center  of  gravity.  Figure  16  illustrates  the 
center  of  gravity  for  each  polyomino  as  a  black  dot  inside,  or  outside,  of  each  polyomino.  For 
simplicity  we  assume  that  each  center  of  gravity  is  located  exactly  at  the  center  of  one  of  the  squares. 

Intuitively,  we  expect  that  for  regular  tilings  the  locations  of  centers  of  gravities  also  have  some 
regular  pattern.  In  particular,  most  of  them  should  he  located  in  certain  rows  and/or  columns. 
According  to  this  observation,  we  define  probability  distribution  for  rows  and  columns  of  any  tiling 
as  follows:  row  i  has  probability  ^  and  column  j  has  probability  where  r[i]  and  c[j }  are  the 
numbers  of  centers  of  gravity  located  in  row  i  and  column  j,  respectively;  T  is  the  total  number  of 
polyominoes  used  for  the  considered  tiling.  Since  each  center  of  polyomino  is  counted  exactly  once 
for  rows  and  exactly  once  for  columns,  it  easy  to  verify  the  validity  of  this  probability  distribution. 
Figure  17  stands  as  a  proof  of  concept  for  our  argument.  Observe  how  centers  of  gravity  are  aligned 
when  entropy  is  minimized.  For  the  maximization  case,  one  can  easily  notice  that  the  centers  of 
gravity  form  almost  uniform  distribution. 

As  mentioned  above,  uniform  distribution  has  the  maximum  entropy.  Therefore, 

n '  ^  / ]\ 

<  H (Uniform)  =  —  N  —  log  (  —  )  =  login)  . 

n  V  n  ) 

1=  1  v  7 

This  theoretical  upper  bound  is  extremely  important  since  it  allows  us  to  evaluate  the  per¬ 
formance  of  the  developed  approximation  or  heuristic  algorithms.  Observe  that  for  the  board  in 
Figure  17  we  have  log(40)  =  3.6889,  implying  that  the  obtained  tiling  in  the  maximization  case  is 
close  to  the  optimal  solution. 

8.3  Mathematical  Programming  Formulations 
8.3.1  Formal  Setup 

Consider  a  rectangular  set  of  equal-sized  squares  (located  next  to  each  other)  and  a  polyomino  that 
covers  some  of  the  squares.  Observe  that  the  polyomino  type  (e.g.,  domino)  and  the  location  of  its 
north-western  corner  completely  determine  squares  covered  by  this  polyomino.  Hence,  it  is  natural 
to  model  our  polyomino  tiling  problem  as  a  set  partitioning  or  a  set  covering  problem:  all  squares 
form  the  ground  set  and  each  polyomino  with  the  location  of  its  north-western  corner  describes  some 
subset  of  the  ground  set.  Before  proceeding  with  the  mathematical  programming  formulations,  we 
define  the  notation  and  provide  the  formal  problem  statement. 

We  will  refer  to  the  rectangular  set  of  squares  that  is  required  to  be  tiled  as  a  board.  We  can 
visualize  the  board  as  a  matrix  since  referring  to  its  rows  and  columns  is  natural.  Thus,  (r,  c) 
denotes  the  square  at  the  intersection  of  row  r  and  column  c.  For  a  standard  board  we  define  two 
regions:  the  “frame”  and  the  “center.”  The  frame  consists  of  a  fixed  number  of  rows  and  columns 
that  form  the  boundaries  of  the  board.  In  general,  frame  is  not  required  to  be  tiled  exactly.  The 
center  of  the  board  (inside  the  frame)  consists  of  squares  that  need  to  be  covered  by  exactly  one 
polyomino.  If  a  perfect  tiling  of  an  mx  n  board  is  required,  then  the  center  of  the  respective  board 
is  of  size  m  x  n. 

To  avoid  the  necessity  of  mentioning  two  different  parameters  for  each  board,  whenever  we 
refer  to  its  dimensions  m  x  n,  the  values  of  m  and  n  define  the  size  of  the  center  of  the  board. 
Furthermore,  we  will  say  that  a  given  tiling  is  perfect  if  none  of  the  squares  of  the  board  frame  are 
covered.  Finally,  we  can  state  our  problem  more  formally: 
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(b)  Maximizing  entropy  for  a  20  x  20  board. 


(c)  Minimizing  entropy:  H(X)  =  2.8842 


(d)  Maximizing  entropy:  H(X)  =  3.6889 


Figure  17:  Validating  entropy  concept  on  pentomino  family. 


INPUT:  A  set  of  polyominoes  P  with  |P|  =  K  members,  m  x  n  board  B,  and  the  type  of  tiling: 
perfect  or  imperfect. 

OUTPUT:  The  most  irregular  (according  to  the  metric  defined  above)  tiling  of  board  B. 

Next,  we  develop  mathematical  programming  formulations  for  this  problem. 
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8.3.2  Nonlinear  Set  Partitioning  Formulation 

In  the  remainder  of  this  chapter  we  assume  that  each  polyomino  covers  exactly  the  same  number 
of  squares  of  the  board  (i.e. ,  all  polyominos  have  the  same  area  size).  In  general  this  assumption 
can  be  easily  relaxed. 

Next  we  introduce  the  following  notation. 

•  Define  0-1  variable  x^q  =  1  iff  the  north-western  corner  of  a  polyomino  of  type  k  is  located 
at  ( p,q ).  For  example,  a  monomino  located  at  (p.  q)  would  cover  a  single  square  {(p,  (/)}; 
similarly,  a  vertical  domino  located  at  ( p ,  q )  would  cover  {(p,  q),  (p  +  1,  g)}. 

•  Let  ri  and  Cj  be  continuous  decision  variables  that  denote  the  number  of  centers  of  gravity  in 
row  i  and  column  j,  respectively.  In  fact,  r  and  c  are  auxiliary  variables  that  are  completely 
determined  by  the  values  of  x  variables. 

•  Let  Iij  be  the  set  of  all  triples  ( k,  p ,  q)  such  that  if  a  polyomino  of  type  k  located  at  (p,  q)  then 
it  covers  (i.  j). 

•  Let  Ri  be  the  set  of  triples  (kpp,  q)  such  that  if  a  polyomino  of  type  k  is  located  at  (p,  q)  then 
its  center  of  gravity  is  located  in  row  i. 

•  Let  Cj  be  the  set  of  triples  (k,p,  q)  such  that  if  a  polyomino  of  type  k  is  located  at  (p,  q)  then 
its  center  of  gravity  is  located  in  column  j . 

•  Let  T  be  the  total  number  of  polyominoes  used  for  tiling  the  board.  In  fact,  due  to  our 
assumption  T  is  constant  since  its  value  can  be  easily  derived  from  the  area  to  be  covered 
and  the  size  of  the  used  polyomino  family.  For  instance,  an  exact  tiling  of  8  x  8  board  using 
tetromino  family  requires  T  =  8  *  8/4  =  16  polyominoes. 

Given  this  notation,  we  provide  the  following  nonlinear  mixed  integer  programming  (MIP) 
formulation  of  the  exact  tiling  problem: 


Pnl- 


min 

i 

j 

s.to 

V  xk  - 
( kpq)elij 

1 

v  i,j 

ri= 

( kpq)£Ri 

V  i 

Cj  ~  Xpq 

(kpq)eCj 

V  j 

xpq  e  {o,  i},  n, 

Cj  >  0 

V  i,j,p,q,k 

8.3.3  Linear  Set  Partitioning  Formulation 

We  use  value  disjunctions  to  reformulate  Pnl  as  a  linear  MIP  that  can  be  tackled  using  any  standard 
mixed  integer  programming  solver  (e.g.,  CPLEX). 
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•  Define  0-1  variable  ru  =  1  iff  there  are  exactly  t  centers  of  gravity  in  row  i. 

•  Similarly,  define  0-1  variable  Cjt  =  1  iff  there  are  exactly  t  centers  of  gravity  in  column  j. 
Then  we  obtain  the  following  linear  MIP  denoted  as  P^. 


min 

m  T 

EE 

lg  2 b)  r“  +  E  E  (w lg  ir)  c“ 

i= 1  t=  1 

X  7  j= 1 t= 1  V  7 

s.to 

E 

—  1 

Uypq  J- 

( kpq)£lij 
T 


Z  xm 

t=l  ( kpq)eRi 

Z t  CP  =  Z  xm 

t= 1  (kpq)&Cj 

T 

Z rit  =  1 

t= 0 
T 

Z  CT  =  1 

t=o 

Xpg,  rit ,  Cjt  €  {0,  1} 


V  i,  j 

V  i 

V  j 

V  i 

V  j 

V  i,j,p,  q,  k,  t. 


If  there  are  K  polyomino  types  used  to  tile  an  m  x  n  board,  we  would  have  O(Kmn)  variables. 
Note  also  that  both  formulations  can  be  easily  modified  to  handle  imperfect  tilings.  Figure  18 
provides  a  perfect  tiling  of  10  x  10  board  and  an  imperfect  tiling  of  9  x  11  board  (there  is  no  perfect 
tiling  for  this  board  size)  using  tetromino  family. 

Proposition  11  Variables  ru  and  Cjt  in  Pl  can  be  relaxed  to  be  nonnegative,  i.e.,  formulation  Pl 
is  locally  ideal. 

Proof:  Consider  the  second  derivative  of  the  function  4>(x)  =  xlg(x)  which  is  continuous  on  (0, 1]: 

<j>"(x)  =  -  >  0  . 
x 

Hence,  <f(x)  is  strictly  convex.  Now  consider  the  strictly  convex  function  4>t(x)  =  ylg(^)  defined 
on  x  £  (0,  T\.  Using  constraints  of  the  model  (E^  rt  =  A  <  T  and  Ert  =  1)  aRd  the  Jensen’s 
Inequality, 

fwf)  i  =  <hW  = 

Eu<Mt) 

Ed 

=  =  Z^g(^)rt 

whenever  more  than  one  rt  is  nonzero.  Thus,  the  lower  bound  on  the  left  hand  side  is  attained  only 
when  ?’a  =  1.  In  the  derivation  above,  we  drop  indices  i  and  j  for  simplicity.  □ 
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(a)  Perfect  tiling  of  10  x  10  board. 


(b)  Imperfect  tiling  of  9  x  11  board. 


Figure  18:  Optimal  tilings  using  the  tetromino  family. 


8.4  Heuristics 

Since  the  obtained  linear  MIP  can  be  used  to  solve  exactly  tiling  problems  of  rather  small  sizes,  we 
also  develop  a  heuristic  approach  that  can  be  applied  to  tile  large-scale  boards.  In  the  algorithms 
described  below  solving  P i  for  small  board  sizes  serves  as  a  subprocedure.  In  order  to  simplify  the 
description  of  the  developed  algorithms,  we  illustrate  them  with  some  simple  examples. 

8.4.1  Heuristic  Procedure:  Zoom-in 

Consider  a  board  that  consists  of  a  single  square.  One  could  enlarge  the  board  by  replacing  the 
square  by  a  2  x  2  board  and  tile  it  using  4  unit  squares.  In  general,  given  board  B  (not  necessarily 
rectangular)  we  can  enlarge  it  replacing  each  unit  square  of  B  by  another  rectangular  board  of 
size  a  x  b.  We  refer  to  this  procedure  as  “zoom-in”  with  level  z  =  (a,  6),  or  z  =  (a  x  b).  If  B  is 
rectangular  of  size  m  x  n,  then  we  obtain  a  board  of  size  m  ■  a  x  n  ■  b. 

Consider  a  set  of  polyominoes  P.  Assume  that  there  exists  a  “zoom”  level  (a,  b )  (obtained 
replacing  each  square  with  an  a  x  b  rectangular  board)  such  that  each  polyomino  in  P  can  be  tiled 
exactly  with  polyominoes  from  P.  Figure  19  illustrates  this  concept  providing  several  pentominoes 
tiled  with  other  pentominoes  at  (5  x  5)  “zoom”  level.  The  respective  tilings  are  obtained  solving 
formulation  P i  exactly. 

For  any  given  initial  tiling  of  size  mxn ,  the  “zoom-in”  procedure  can  be  used  to  generate  tilings 
of  m  ■  ax  x  n  ■  bx  for  any  positive  integer  x.  Procedure  Zoom-in  below  provides  the  pseudo-code  of 
the  algorithm. 
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(a)  ‘F’  rotated  90°. 


(b)  ‘T’  rotated  270°. 


(c)  ‘W’. 


Figure  19:  Several  pentominoes  at  (5  x  5)  “zoom”  level. 


Procedure  Zoom-in _ 

Input:  m  x  n  board  B  tiled  exactly  (e.g.,  via  solving  P^)  by  some  set  of  polyominoes  P\, 

required  “zoom”  level  z  =  (a,b),  and  exact  tilings  (e.g.,  also  obtained  via  solving  P^,) 
of  each  polyomino  in  Pi  with  some  other  set  of  polyominoes  P2  (possibly  same  as  P±) 
at  the  “zoom”  level  z. 


1 

2 

3 

4 

5 

6 
7 


begin  Zoom-in 

Replace  each  unit  square  of  B  with  an  a  x  b  board. 

Denote  the  final  enlarged  board  be  Bz. 

Denote  by  pz  in  Bz  a  polyomino  p  £  Pi  enlarged  by  z  after  the  “zoom-in” . 
foreach  pz  in  Bz  do 

Let  p*z  be  the  perfect  tiling  of  pz  using  P2. 

Replace  pz  with  p*z  in  Bz . 


8  return  Final  tiling. 


8.4.2  Heuristic  Procedure:  Magnify 

Consider  a  set  of  polyominoes  P  and  some  mxn  board  tiled  exactly  solving  formulation  P^.  If  one 
requires  tilings  of  other  board  sizes,  procedure  Magnify  is  designed  to  serve  this  aim.  Unfortunately, 
the  resulting  tiling  may  be  not  necessarily  perfect. 

Assume  that  we  are  required  to  tile  a  board  of  size  M  x  N,  where  M  »  m  and  N  »  n. 
Suppose  there  exists  a  “zoom”  level  (a,  6)  (i.e. ,  obtained  replacing  each  square  with  an  a  x  b  rect¬ 
angular  board)  such  that  each  polyomino  in  P  can  be  tiled  exactly  with  polyominoes  from  P.  One 
can  simply  first  enlarge  the  original  tiling  until  m  ■  ax  >  M  and  n  ■  bx  >  N,  and  then  drop  the 
polyominoes  that  are  completely  outside  the  board  of  the  required  size.  Otherwise,  we  can  simply 
paste  the  copies  of  the  initial  mxn  tiling  side  by  side  until  we  cover  the  board  above  the  required 
size  and  then  drop  the  polyominoes  that  are  completely  outside. 

Figure  20  shows  the  solutions  of  Figure  18  pasted  side-by-side  and  “zoomed-in.”  Initially,  the 
tiling  of  Figure  18(a)  is  enlarged  to  50  x  50  and  tiling  of  Figure  18(b)  is  enlarged  to  45  x  55.  Both 
tilings  then  reduced  to  45  x  45. 
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Procedure  Magnify 

Input:  (1)  Required  tiling  dimensions  ( M  x  N).  (2)  Either  (2a)  a  small  (not  necessarily 
perfect)  tiling  of  size  m  x  n  together  with  the  perfect  tiling  of  each  polyomino  at 
some  “zoom”  level  (a  x  b )  exists  or  (2b)  a  small  perfect  tiling  of  size  (m  x  n)  exists. 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


if  Tiling  of  each  polyomino  at  “zoom”  level  a  x  b  exists  then 

Let  x  be  the  smallest  integer  satisfying  M  <  m  ■  ax  and  N  <  n  ■  bx 
“Zoom-in”  x  times  (Alg.  Zoom-in) 

Draw  a  rectangle  of  size  M  x  N  inside  the  obtained  tiling 
Drop  any  polyomino  that  lies  completely  outside  the  M  x  N  rectangle 
else 

Let  y  and  x  be  the  smallest  integers  satisfying  M  <  m  ■  y  and  N  <n-x 

Create  a  horizontal  strip  by  pasting  x  copies  of  the  m  x  n  perfect  tiling  side  by  side 

Paste  y  copies  of  the  obtained  strip  vertically  to  get  (m  •  y)  x  (n  ■  x )  perfect  tiling 

Draw  a  rectangle  of  size  M  x  N  inside  the  obtained  tiling 

Drop  any  polyomino  that  lies  completely  outside  the  M  X  N  rectangle 


12  return  Final  tiling. 


(a)  Initial  perfect  tiling  pasted  side-by-side. 


(b)  Initial  imperfect  tiling  zoomed-in. 


Figure  20:  Initial  solutions  of  Figure  18  magnified. 


8.4.3  Heuristic  Procedure:  Retile 

Procedure  Retile  is  an  important  part  of  our  heuristic.  Consider  board  B  and  assume  some  of  its 
tilings  is  given  (e.g.,  obtained  via  any  of  the  procedures  described  above).  Let  (r,  c)  be  some  square 
of  B.  Consider  a  smaller  board  S  of  size  d  x  d  centered  at  (r,  c).  There  are  some  polyominoes  that 
cross  the  boundaries  of  S  and  others  that  are  completely  inside  S.  We  may  fix  the  polyominoes 
crossing  the  boundaries  and  retile  the  area  covered  by  the  others.  Retiling  starts  with  the  existing 
tiling  of  S  (i.e.,  polyominoes  that  are  completely  inside  S  form  a  feasible  solution  for  S );  therefore, 
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we  are  always  guaranteed  to  have  a  solution,  which  is  at  least  as  good  as  the  original  one.  Therefore, 
we  possibly  may  improve  irregularity  of  a  very  large  tiling  by  solving  smaller  MIPs. 


Procedure  Retile _ 

Input:  Tiling  of  m  x  n  board  B,  d  (d  <  min{m,n}),  (r,  c)  for  S,  and  the  type  of  retiling 
(perfect  vs.  imperfect). 

1  Draw  a  square  S  of  dimension  d  x  d  centered  at  (r,  c)  of  B. 

2  Divide  S  into  three  regions:  Center,  CoveredFrame,  and  FreeFrame. 

3  Center  consists  of  squares  in  S  that:  (i)  are  covered  by  polyominoes  that  are  completely  in  S 
and  (ii)  do  not  belong  to  the  frame  of  B. 

4  CoveredFrame  consists  of  squares  in  S  that:  (i)  do  not  belong  to  the  Center  and  (ii)  do  not 
belong  to  the  frame  of  B. 

5  FreeFrame  consists  of  squares  in  S  that  do  not  belong  to  the  frame  of  B. 

6  Mark  the  Center  as  to  be  covered  exactly , 

r  Mark  the  CoveredFrame  as  not  to  be  covered , 

8  if  retiling  type  is  ‘perfect’  then 

9  |  Mark  the  FreeFrame  as  to  be  “penalized”  if  covered. 

10  else 

»  L  Mark  the  FreeFrame  as  to  be  packed  on. 

12  Using  the  appropriate  entropy  maximizing  mathematical  model,  tile  S. 

13  return  Final  tiling. 


8.4.4  Heuristic  Procedure:  Smoothen 

We  can  slightly  modify  the  objective  function  of  formulation  “penalizing”  for  tiled  squares  in 
the  frame  of  board  B.  Procedure  Smoothen  takes  as  its  input  an  imperfect  tiling  (possibly  a  very 
large  one)  of  B.  Then  it  attempts  to  obtain  a  perfect  tiling  of  B  by  “moving”  along  its  frame  and 
retiling  B  while  “penalizing”  for  squares  in  the  frame  of  B  that  are  tiled. 

Procedure  Smoothen _ 

Input:  An  imperfect  tiling  of  rn  X  n  board  B. 

1  Let  d  x  d,  d  <  min {m,n},  be  a  square  that  can  be  tiled  in  a  short  time  using  standard  MIP 
solvers. 

2  Consider  the  following  closed  rectangular  path  P  on  B  : 

(d,  d)  — >  (d,  n  —  d)  — >  (m  —  d,  n  —  d)  —>  (m  —  d,  d)  —>  (d,  d). 

3  Let  5  be  the  step  size  along  the  path.  //  Usually  5  <  d/2 

4  Using  the  d  x  d  board  and  step  size  5,  retile  B  perfectly  (Alg.  Retile)  along  the  path  P. 

5  return  Final  tiling. 


8.4.5  Heuristic  Procedure:  Randomize 

Procedure  Randomize  is  another  building  block  of  our  heuristic.  Given  some  tiled  board  B  (typically 
very  large),  it  traverses  along  B  and  applies  Procedure  Retile  to  “randomize”  (i.e.,  increasing 
irregularity  according  to  the  developed  metric)  the  subregions  in  B.  Figure  21  illustrates  application 
of  Procedures  Retile  and  Randomize  starting  from  the  tiling  of  Figure  20(a)  to  obtain  a  near-perfect 
irregular  tiling. 
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(a)  Perfect  tiling  from  18(a)  pasted  side-by-side:  6  %  gap  (b)  Tiling  from  21(a)  randomized:  0.2  %  gap  with  respect 
with  respect  to  the  theoretical  upper  bound  on  irregularity,  to  the  theoretical  upper  bound  on  irregularity. 


D 


(c)  Tiling  from  21(b)  smoothed  along  the  boundaries. 


Figure  21:  Perfect  tiling  of  Figure  18(a)  magnified,  randomized  and  smoothed. 
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(a)  Members  of  octomino  family  that  are  used. 


(b)  12  x  16  tiling. 


(c)  Tiling  in  (b)  “zoomed-in”  by  (4,4). 

Figure  22:  12  x  16  board  magnified  to  48  x  64  board  in  less  than  1  minute. 
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(a)  Tiling  from  22  (c)  pasted  side-by-side  to  obtain  96  X  128  board. 


(b)  Tiling  from  23(a)  randomized.  Final  gap  0.37  %  with  respect  to  the  theoretical  upper 
bound  on  irregularity. 

Figure  23:  Final  tiling:  original  12  x  16  board  magnified  to  96  x  128  and  randomized. 
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Procedure  Randomize _ 

Input:  Tiled  m  x  n  board  B  and  the  type  of  the  required  tiling  (perfect  vs.  imperfect). 

1  Let  d  x  d,  d  <  min {m,n},  be  a  square  that  can  be  tiled  in  a  short  time  using  standard  MIP 
solvers. 

2  Let  6  be  the  step  size.  //  Usually  5  <  d/2 

3  Consider  the  following  path  P  on  B  : 

(d,  d)  — y  (d,  d  T  d)  — y  . . .  — y  (d,  —  d)  — y  (d  -f-  d,  d)  — y  . . .  — y  (d  T  d,  n  —  d)  — y  . . .  — y  {m  —  d,  ti  —  d) . 

//If  (n  —  2d)/d  is  not  integral:  . . .  (d,  d  +  d|_(n  —  2d)/dJ)  — y  (d,  n  —  d) . . . 

4  Using  the  d  x  d  board  and  step  size  d,  retile  B  (Pro.  Retile)  along  the  path  P  enforcing  the 
required  type  of  tiling. 

5  return  Final  tiling. 

Figures  22  and  23  illustrates  application  of  the  developed  heuristic  on  another  problem  instance. 
Comparison  to  the  theoretical  upper  bound  proves  that  the  algorithm  is  rather  successful  in  obtain¬ 
ing  an  irregular  tiling. 

8.5  Current  Work  and  Concluding  Remarks 

The  above  described  heuristic  can  be  applied  to  obtain  arbitrarily  large  tilings,  though  not  necessar¬ 
ily  perfect.  Observe  that  exact  MIP  formulation  P^  is  at  the  core  of  the  algorithm.  Therefore,  we 
are  currently  working  on  developing  more  advanced  exact  (!)  solution  approaches  that  will  be  able 
to  solve  exact  tiling  problems  for  larger  problem  sizes.  There  are  three  distinct  research  directions: 

(i)  another  set  partitioning  formulation  (we  are  currently  performing  some  preliminary  compu¬ 
tational  tests); 

(ii)  a  better  branching  strategy  (specifically  using  constraints  of  P^); 

(iii)  a  more  advanced  branch-and-price  algorithm. 

The  PI  expects  that  the  first  paper  on  the  topic  will  be  submitted  for  publication  within  next 
two-three  months.  The  target  journal  is  INFORMS  Journal  on  Computing. 
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